PART 36 · FIELD JOURNAL

12 min read·2026-06-04

Shannon v1.2.0: Multi-Stage Agentic Work as a Sequence of Provable Steps

A Claude Code plugin that refuses to say 'done' until the evidence is on disk. 36 skills, 11 agents, 22 commands, 7 hooks — and a doctor that reads its own contract.

View companion repo

#AgenticDevelopment #ClaudeCode #Orchestration #FunctionalValidation #Plugins

The gap between "it compiled" and "it works"

Every agent framework I have shipped eventually hits the same wall. The model reports success. The build is green. The task list is all checkmarks. And the feature does not work.

The failure is not the model being dishonest. It is the framework accepting the wrong proof. A successful build proves the code compiled. A passing unit test proves a mock behaved the way the test author imagined. Neither proves the real system did the real thing. When an agent runs unattended for an hour across planning, implementation, and validation, that gap compounds: each stage inherits the previous stage's unverified claim.

Shannon is the plugin I built to close that gap. Its one non-negotiable rule: a verdict requires real-system evidence on disk, or it is a refusal. No mocks. No stubs. No test files. The plugin would rather stop and write a REFUSAL.md citing exactly what is missing than emit a green checkmark it cannot back.

This post is the v1.2.0 release. It is also a usage manual. I walk every command path with real invocations, because a framework you cannot drive is a framework you will not use.

pillar 1Embedded sub-agent skillsskill content inlined into the agent manifest at build time

pillar 2Orchestrationsingle-message multi-Task: sequential / parallel / competitive

pillar 3Iron Rule validationreal-system evidence on disk for every completion claim

pillar 4Meta-judge consensusrubric YAML before scoring, threshold hidden from judges

pillar 5Self-instrumenteddoctor validates the contract; SDK probe confirms load

Five pillars hold it together. Every command routes through the same five — shared vocabulary, shared evidence discipline, shared refusal semantics.

What ships in v1.2.0

The surface, verified against the repository at release (scripts/doctor.py, 10/10 mechanical checks): 36 skills, each with progressive-disclosure references; 11 agents, with each agent's skills embedded inline at build time; 22 /shannon:* slash commands; and 7 hooks registered through hooks/hooks.json with a required_hooks dependency contract.

Install

// Install from the published GitHub repo

/plugin marketplace add krzemienski/shannon
/plugin install shannon@shannon

Shannon is standalone. It uses Context7 and sequential-thinking MCP servers when they are present and degrades gracefully when they are not. No required MCP servers, no external services.

// Activate enforcement per-project, then confirm the contract

/shannon:enforce on          # <on|off> [--force]
/shannon:doctor --verbose    # [--verbose] for the full per-check breakdown

/shannon:enforce on writes a .shannon/active sentinel. Shannon's hooks are no-ops in any project that has not opted in, so installing the plugin globally does not impose the Iron Rule on every repository you touch. You choose where it applies.

Check

Value

Status

plugin-manifest

—

●PASS

skills-count

36

●PASS

agents-count

11

●PASS

commands-count

22

●PASS

hooks-count

7

●PASS

required-hooks-contract

—

●PASS

build-state

—

●PASS

skill/agent body-refs

—

●PASS

/shannon:doctor at v1.2.0 — 10 checks pass, 0 fail, 0 mismatches. The version is read live from .claude-plugin/plugin.json, not hard-coded: a doctor that hard-codes the version it audits is a doctor that lies the moment the manifest drifts.

Planning: four modes, one command

/shannon:plan is the entry point for every plan. If a codebase is present, it runs codebase analysis and a skill inventory first (no opt-in needed) so the plan is grounded in what actually exists, not in what the model assumes exists. Use --greenfield to skip that pre-step for a brand-new project.

The mode flag picks the planning strategy:

default--mode linearone hierarchical plan: plan.md + phase-NN.md with validation gates

--mode convergedraft → critic red-teams → rewrite, N rounds; when plan shape is the risk

--mode tournamentN candidate plans from different angles, judge ranks against a rubric

--mode deeptournament → converge → consensus, chained without a human between stages

Executing: from a plan to verified done

Once a plan exists, /shannon:cook executes it end to end. It also accepts a bare task description (skipping the plan step) plus --auto (small-task fast path), --fast (skip refinement loops), --no-validate, and --greenfield.

Cook spawns an executor agent with embedded validation skills. It runs each phase, captures evidence, and routes the result through completion-gate for the mechanical final check. If a gate cites a blocker, refusal-discipline writes a REFUSAL.md with the specific cited blockers — and there is no --force flag to override it.

Autopilot: refusal-driven retry

For unattended runs, /shannon:autopilot wraps cook in a retry loop: run cook; read the completion-gate verdict; on COMPLETE exit success; on REFUSED parse the cited blockers from REFUSAL.md, build a remediation prompt targeting only those blockers, and try again; after --max-attempts still refused, emit a final REFUSAL.md and exit failure.

“Autopilot never force-completes. A run that cannot produce evidence ends in an honest refusal, not a fabricated success.”

When fan-out outgrows a turn: Dynamic Workflows

Anthropic shipped Dynamic Workflows in Claude Code — a JavaScript control program the runtime executes in the background, outside the conversation turn. Shannon's orchestration cluster knows when to reach for it. In-turn dispatch is right for 2–8 independent targets whose results you need together now. But when the work-list is large or unbounded, or when the run must survive a context interrupt, Shannon now recommends emitting a Workflow instead:

SIGNAL IN THE TASK → RIGHT SUBSTRATE

Signal in the task	Right substrate
2–8 independent sub-tasks, results needed this turn	/shannon:dispatch-parallel
Sequential tasks needing review between each	subagent-driven-development
Dozens of agents, or fan-out over an unknown-size work-list	Dynamic Workflow pipeline()
Must resume across a context interrupt	Dynamic Workflow (journaled, resumeFromRunId)
Loop-until-dry / loop-until-budget discovery	Dynamic Workflow while + budget

The escalation is advisory: Shannon plans the orchestration and names the Workflow shape; you run it. The Iron Rule survives the handoff — a Workflow's agent() calls produce the same real-system evidence, and completion-gate still cites it.

Validation: the Iron Rule in practice

/shannon:validate runs functional validation against the real system and emits per-criterion verdicts, each citing a specific evidence file. It detects the platform (override with --platform ios|web|api|cli|fullstack), captures screenshots / API responses / logs to e2e-evidence/<run-id>/, and emits PASS/FAIL per criterion. --mode consensus spawns 3 isolated validators plus a confidence-scored synthesis.

The enforcement is not a guideline — it is hooks that fire at the moment of the Write:

01The fab-file hook refuses any Write to *.test.*, *.spec.*, tests/, __tests__/, mocks/, or stubs/. You cannot quietly retreat to a mock, because the file never lands.
02The post-action hook reminds, after every build command, that compilation is not validation.
03The evidence-gate hook fires on TaskUpdate(status=completed) and checks for fresh evidence. No evidence, no completion.

// REFUSAL.md — cited blockers and what would resolve them

# REFUSAL — user signup flow

## Cited blockers
- MSC-3 (email confirmation): no evidence at e2e-evidence/<run>/step-04-confirm.png
- MSC-5 (session persists): API response at e2e-evidence/<run>/step-06-session.json
  shows 401, expected 200

## What would resolve it
- Capture the confirmation-page screenshot after clicking the emailed link
- Fix the session cookie not being set on the /login response, then re-validate

Refusal is a feature. An agent that refuses with cited blockers is more useful than one that claims success you then have to disprove.

Self-instrumentation: the plugin that checks itself

The two harnesses under scripts/ are how Shannon stays honest about its own surface. scripts/doctor.py is the mechanical contract — the 10-check JSON above, the fast no-API check you run after any change. scripts/harness/load_check.py is the real one: it issues an actual Claude Agent SDK query against the running plugin and reads the init payload to confirm every command is addressable. It needs no API key — the SDK inherits the Claude Code CLI's own authentication.

// load_check.py --json at v1.2.0

{
  "total_slash_commands": 978,
  "shannon_commands_loaded": 57,
  "expected_commands": 22,
  "missing_commands": [],
  "verdict": "PASS"
}

missing_commands: [] is the assertion that matters: every one of the 22 commands the repository ships is addressable in the loaded plugin. The probe earned its keep during an earlier release — one run returned verdict: FAIL with missing_commands: ["enforce", "scope"], which was not a bug in the commands but proof that the CLI was loading a stale cached build instead of the release tree. The probe caught the mispointing; fixing the marketplace pointer turned the FAIL into a PASS. That is self-instrumentation doing its job.

The rest of the surface

The commands beyond plan/cook/dispatch/validate round out the lifecycle:

01/shannon:fix — a 3-strike bug-fix runner; each attempt targets a different root cause and writes fresh evidence.
02/shannon:loop — a do / verify / reflect convergence loop.
03/shannon:team — multi-teammate orchestration with file-ownership boundaries so no two agents write the same file.
04/shannon:research — parallel researcher fan-out with a cited summary.
05/shannon:prd — interview-driven PRD authoring.
06/shannon:reflect — self-refinement (self / critique / memorize).
07/shannon:why — five-whys root-cause analysis.
08/shannon:trace and /shannon:retro — session-JSONL timeline and retrospective mining.
09/shannon:resume — resume a halted run from its evidence tree, no separate state file required.
10/shannon:audit — read-only audit across screen / app / session / drift / completion-evidence.
11/shannon:scope — brownfield reconnaissance: codebase + skill inventory + session context.

Every one routes through the same five pillars. The vocabulary is shared, the evidence discipline is shared, and the refusal semantics are shared.

Why a stable label, and not v7

Shannon went through informal iterations numbered up to v7 in private. v1.0.0 was the first version I was willing to put a stable label on — not because the earlier work was wasted, but because it was the first cut where the surface, the documentation, and the self-instrumentation all agreed with each other. The doctor reads the manifest. The README counts match the directories. The SDK probe confirms the commands load. When the three sources of truth agree, the version number means something.

v1.2.0 is the second stable release on that same discipline: the surface grew to 36 skills, 11 agents, 22 commands, and 7 hooks — adding DeepPlan with dependency-ordered wave execution, the evidence-gated forge pipeline, and the skills10x activation harness — and /shannon:doctor still reports 10/10 with zero mismatches. The numbers only moved because the directories did, and the doctor proves it on every run. Install it, run /shannon:doctor, and watch a plugin check its own contract before you trust it with yours.