Functional validation for agents — evidence over mocks.
Ship functional evidence, not passing tests. Every completion claim produces screenshots, API responses, and build logs that a skeptical reviewer can open.
“A successful build only proves it compiled. A passing test confirms the mock behaved correctly. ValidationForge proves the real system does the real thing.”
ValidationForge is a Claude Code plugin that replaces unit-test theater with functional validation against the real running system. Every completion claim produces evidence — screenshots, API responses, build logs — that a skeptical reviewer can open.
Ships as 52 validation skills, 19 slash commands, 7 enforcement hooks, 7 agents, and 9 rules files. The block-test-files hook refuses to create test files. The validation-not-compilation hook rejects completion claims that cite only a successful build. The evidence-gate-reminder hook injects an evidence checklist before any task is marked done.
01. ARCHITECTURE
How it's built
MODULE 00
VALIDATE
Seven-phase pipeline from preflight to verdict. Every journey captures real evidence — screenshots, API responses, build logs — before a gate opens.
MODULE 01
CONSENSUS
N independent validators each run the full journey list. The synthesizer votes per-criterion. Split verdicts escalate to human rather than downgrade silently.
MODULE 02
FORGE
Autonomous fix-and-revalidate loop with a 3-strike cap per journey. Each attempt targets a different root cause. Rollback on exhaustion.
02. FEATURES
No Mocks, Ever
Enforcement hooks block creation of .test.*, mock libraries, and in-memory test doubles. Fix the real system instead.
Evidence Chain of Custody
Every validation step captures screenshots, API bodies, and log output to disk. Missing files = FAIL.
Three Engines
VALIDATE (beta) runs single-pass validation. CONSENSUS (V1.5) votes contested verdicts. FORGE (V2.0) runs autonomous fix loops with rollback.
Dependency-Aware Waves
DB → API → Web/iOS. Downstream validators never run against a failing upstream.
03. QUICK START
04. INSTALL
- 01claude plugin marketplace add krzemienski/validationforge
- 02claude plugin install validationforge@validationforge
- 03Run /vf-setup to initialize configuration + preflight checks
- 04Run /validate-plan <journey-name> to define PASS criteria + evidence requirements
- 05Run /validate-sweep to execute against the real system and capture evidence
- 06Run /validate-dashboard to render the final evidence-backed verdict
05. FEATURE MATRIX
06. DOCS
06. RESOURCES
- v1.0.02026-04-21Stable VALIDATE engine, 52 skills, 19 commands, 7 hooks shipped.
- v0.9.02026-03-28Dependency-aware wave scheduling for multi-platform projects.
- v0.8.02026-03-12Evidence dashboard + per-journey verdict synthesis.
RELATED PRODUCTS
MORE PRODUCTS