
Hooks as a Control Plane
Four hook events form the entire governance surface of Claude Code — refuse tools, gate commits, block deploys, enforce evidence. Once you see it as a control plane, the whole agent stack changes shape.
View companion repoI have 3,902 hook scripts under ~/.claude/plugins/cache/ (reproduce: find ~/.claude/hooks ~/.claude/plugins -name '*.cjs' -o -name '*.js' -o -name '*.sh' | wc -l). That number jumped out at me. Most of them I never wrote. They're the enforcement layer that came along with plugins I installed for unrelated reasons — Crucible, ValidationForge, Anneal, the dozen smaller plugins from the marketplace ecosystem. Each one shipped its own hooks, each hook quietly governing some part of every session that touches that plugin.
That's the moment I stopped thinking about hooks as "scripts that run before tool calls" and started thinking about them as a control plane. Plugins are packaging. Skills are workflows. Hooks are policy.
The Four Events
Claude Code dispatches hooks on four events, and that small surface is the entire governance API:
- 01PreToolUse — fires before a tool runs. Exit code 2 blocks the call. The hook gets the tool name, the parameters, and a chance to refuse.
- 02PostToolUse — fires after a tool completes. Can mutate the tool result before it returns to the model. Used for filtering, redaction, audit logging.
- 03Stop — fires when the agent declares the session complete. Exit code 2 prevents the session from ending. Used for verification gates.
- 04SessionStart — fires once at the top of a session. Sets up environment, preloads context, validates prerequisites.
Four events, registered through a hooks.json file at the plugin root:
The matcher field is regex. It scopes the hook to specific tool names. PreToolUse matched against Bash only fires on shell commands. PreToolUse matched against Edit|Write fires on file mutations. The matcher is what makes hooks composable — you can stack ten plugins, each with its own matchers, without them stepping on each other.
Crucible's Stop hook is the canonical example of governance-via-hook. Its job is to refuse session termination unless evidence/completion-gate/report.json exists and contains overall: "COMPLETE". Exit 2 keeps the session alive. There's no override flag. The agent can't /exit past it. It runs every time, and it refuses every time the evidence isn't there.
The Refusal Pattern
A hook that prints to stderr and exits 2 cancels the operation. The model sees the stderr text in its next turn, framed as a tool error. That feedback loop is what makes hooks useful as policy enforcement: the agent gets told no, gets told why, and adjusts.
Here's the bash-guard hook from claude-code-discipline-hooks. It refuses any rm -rf against a path outside the project directory:
Twenty-two lines, one absolute rule. The agent can't rm -rf /tmp/foo if the working directory isn't /tmp/foo*. It will get the stderr message, attempt something else, and the hook will pass-through the next call if it's safe. The model adjusts; the human's invariant holds.
The compounding power shows up when you stack these. I have hooks that refuse:
- 01git push --force against any branch matching main|master|prod
- 02Any tool call that mutates files inside ~/.claude/skills/ (those are user-authored, agents don't get to write there)
- 03Bash commands that source untrusted files from /tmp or /var/tmp
- 04Edits to .env files that introduce new secrets without going through the secrets-manager skill
- 05Any Write to a file path containing node_modules (drift detection between intent and action)
Each one is a tiny script. None of them know about each other. Together they form a policy mesh that survives whichever agent or skill is running.
PostToolUse and the Filter Pattern
PreToolUse can refuse. PostToolUse can rewrite. The distinction matters because some governance problems aren't about blocking actions — they're about what the agent gets to see.
The redaction hook I run on Read calls is the cleanest example:
Fifteen lines. Every file the agent reads gets passed through this filter. If a config file accidentally contains a real API key, the model never sees it. The audit log keeps the original. The model gets the redacted version. This isn't theoretical — it has caught me checking secrets into a file I didn't realize would be read by a session, three times.
PostToolUse hooks can also add context. The functional-validation skill ships a PostToolUse hook for Bash that detects when you ran a build command and appends a small note to the result: Build success ≠ functional validation. Did you exercise the feature through real UI? That's policy enforcement through nudging, not refusal. The agent reads the note, often acts on it, occasionally pushes back. Both behaviors are fine.
SessionStart: The Environment Contract
SessionStart hooks fire once. They set up the world the session runs in. The most useful pattern I've found is the prerequisite check — refuse to start a session if the environment isn't right.
Twenty-five lines. The session won't even start if the toolchain is wrong or there's leftover state from a prior crashed run. That's preferable to letting the agent boot, attempt work, and fail mid-flight. SessionStart is the place to fail fast.
What Hooks Are Not
Hooks are not skills. Skills are invoked when the user's phrasing matches a description. Hooks are invoked unconditionally on event boundaries. A skill is a routing decision; a hook is a policy.
Hooks are not plugins. A plugin is a packaging unit — it ships hooks, skills, MCP servers, agents, all bundled. The plugin is the container. The hooks are one of the things the container can hold.
Hooks are not tests. Tests check behavior. Hooks enforce constraints. The Crucible Stop hook doesn't test that the work is done — it refuses to let the session end without evidence that something checked. The hook is the enforcement; the evidence is whatever produced the proof.
The Multi-Plugin Mesh
The interesting property emerges when multiple plugins ship hooks that all match the same event. Claude Code runs them in registration order, and any single hook returning exit 2 cancels the operation. That means policies compose by union, not intersection. If Crucible's Stop hook says "no" and ValidationForge's Stop hook says "yes," the session stays open. Both have to agree before the session can end.
I checked my installed plugins. Right now, on a session with Crucible + ValidationForge + discipline-hooks + the agent-browser skill set, my Stop event fires four hooks. PreToolUse on Edit fires three. PostToolUse on Bash fires five. The total cost: about 80ms per tool call. The benefit: every session is governed by overlapping policy without me having to maintain any of it.
Building Your Own
Start with one hook. Pick the failure mode that bites you most often. Mine was git push --force main after a rebase went sideways. Twenty-line bash script, exit 2 if the branch matches main|master|prod, registered as PreToolUse on Bash. Took ten minutes. Has refused three accidental force-pushes since.
Below are the three starter hooks I hand anyone setting up their first hooks.json. One per event class. Numbers come from time wrappers I left on the scripts for a week — observed wall-clock from a single workstation, not benchmarks. Treat them as order-of-magnitude.
1. PreToolUse — block destructive bash
Eight lines of logic. Catches late-night force-pushes after a bad rebase, agents trying to "clean up history" without asking, and scripts copied from blog posts. Register under PreToolUse with matcher Bash.
2. PostToolUse — append a validation nudge
PostToolUse hooks rewrite the tool result in place. This one fires only when a build succeeds and appends one line to what the agent reads next. In my sessions the model follows up with a real validation step about 60% of the time; the other 40% it pushes back. Both behaviors are fine — the nudge is policy, not coercion.
3. Stop — refuse termination without evidence
Same shape as the Crucible completion gate — refuse termination unless evidence on disk says the work is real. The minimal version checks the latest JSON has status: "PASS"; Crucible's version walks the full evidence tree against twenty-one mandatory criteria. Same pattern, different scale.
Add the second hook when you hit the next failure. By five, the pattern is obvious: every recurring "I should have caught that" moment is a hook that doesn't exist yet. The 3,902 hooks in my cache aren't there because I'm paranoid — they're there because every plugin I installed shipped its own list of those moments.
FAQ
When does PreToolUse fire vs PostToolUse?
PreToolUse fires before the tool executes — it sees tool_input and can refuse via exit 2. PostToolUse fires after the tool returns — it sees tool_response and can rewrite what the model reads next. Use PreToolUse to block, PostToolUse to filter, redact, or annotate. A PostToolUse hook cannot stop the side effect, only shape what the agent learns from it.
What's the difference between Stop and SubagentStop?
Stop fires when the top-level session declares completion. SubagentStop fires when a spawned subagent finishes and returns to the parent. Evidence-gate patterns belong on Stop; subagent-level checks ("every research subagent must have written a report") belong on SubagentStop. Putting an evidence-gate on SubagentStop will refuse mid-conversation and stall the parent — a common first-time mistake.
How do hooks compose, and what's the ordering rule?
Multiple hooks on the same event run in registration order. Any single hook returning exit 2 cancels the operation; remaining hooks for that event do not run. Policies compose by union, not intersection — every hook gets veto power, the strictest one wins. If you need ordering guarantees, put both behaviors in the same hook rather than relying on registration order across plugins.
What's the perf cost of a typical hook?
In our setup, pure-bash PreToolUse hooks run 15-40ms p50; PostToolUse hooks doing JSON rewrites run 40-90ms p50; Stop hooks that walk an evidence tree or shell out to gh/git run 90-200ms p50. Four to six hooks on the busiest event total around 80ms per tool call, sampled across ~8,400 invocations on one workstation. Interpreter startup dominates; rewriting Node → bash typically cuts overhead 3x.
“A control plane you don't have to think about is the only one that actually works.”
Continue the series
- 21OperationsThe Economics of a SessionToken spend per task complexity, when Opus actually pays back over Sonnet, and the cost-of-defect curve that makes "use the cheap model" the most expensive choice you can make
- 23OperationsPrompt Caching EconomicsCache reads cost a tenth what cache writes cost, and most agents leave that 90% discount on the table because nobody structures their system prompt for hits. Here's how to order your messages so the cache pays you back.
- 20OperationsCrucible: Refusal-Driven Verification for Claude CodeThe gate between 'I did the work' and 'the work is done' — 10 phases, 3 reviewers, 3 oracles, zero override flags.
- 24OperationsCustom MCP ServersWhen the prebuilt MCP servers run out of road, you write your own. The protocol is a four-method handshake, the transport is stdio or HTTP, and the whole thing fits in 200 lines of TypeScript.