Drift Detection

Specs and code diverge silently. The 60-day audit pattern, the four-class taxonomy (dead, drifted, lying, fine), and what I do every quarter to make sure my docs still describe what my code actually does.

View companion repo

#AgenticDevelopment #ClaudeCode #Documentation #Quality

I keep dated planning directories. They look like plans/260305-content-pipeline-v2/, plans/260420-syndication-engine/, ten of them now spanning the past year. Each one has a PLAN.md, an ARCHITECTURE.md, sometimes a HANDOFF.md, and a list of decisions from the moment I made them.

About 60 days after each plan went live, I started noticing the same thing every time: the plan was lying. The code didn't match it. The architecture had drifted. The decisions documented in the handoff had been quietly reversed by some commit that didn't update the docs.

That's drift. It's the slow divergence between what your written intent says and what your code actually does, and it accumulates whether you notice or not. I built an audit pattern for it because the alternative — assuming docs match code — kept biting me.

60-day audit

exists

behavior ok

bucket

update spec

Specv1.0 · day 0

Pass 1exists? grep

Pass 2behaves? read code

Pass 3why? git log

ClassifyDEAD · DRIFT · LIE · FINE

Audit state machine. 60-day re-audit triggers three sequential passes. Pass 1 (existence, grep). Pass 2 (behavior, read code). Pass 3 (decision, git log). Classify into four buckets, update spec, repeat.

The Four-Class Taxonomy

When I audit a spec against code, every claim in the spec falls into one of four buckets:

01DEAD — the spec describes something that no longer exists. The code referenced got deleted, the feature got removed, the file path is gone.
02DRIFTED — the spec describes something that exists but has changed materially. The function still has the same name; its behavior is different.
03LYING — the spec describes something that exists and never existed in that form. Aspirational text that someone wrote describing how things should work, but never did.
04FINE — the spec describes something that exists and matches reality.

The four classes are mutually exclusive and (in my experience) collectively exhaustive. Every paragraph in a spec maps to one of them. The audit pass walks through the spec, classifies each claim, and produces a report that's actionable. The hard part is the classifier. There's no automated way to tell DRIFTED from FINE without reading the code and the spec side by side. So the audit is necessarily human-driven, with tooling assistance.

The 60-Day Trigger

Why 60 days? Because that's roughly the half-life of accuracy in my project specs. Across the ten dated plan directories I audited, the average spec was 90% accurate at week two, 75% accurate at week four, 50% accurate at week eight. By twelve weeks it was 30% accurate — most of the document had been overtaken by code changes.

The decay isn't linear. There's a cliff somewhere between week six and week ten where major refactors happen, decisions get reversed, and the spec falls off a cliff. Audit before the cliff and you're maintaining minor drift. Audit after the cliff and you're rewriting the document.

So I audit at 60 days. Calendar-driven. Every plan directory gets a check-in two months after its handoff date, regardless of whether I think it needs one. The check-ins that find nothing wrong (rare) take 20 minutes. The ones that find significant drift (most) take 1-2 hours.

The Audit Procedure

Three passes. Each pass produces a different artifact.

Pass 1: Existence check. Walk every file path mentioned in the spec. For each one, run ls and check if it exists. For each function or class name mentioned, run rg to confirm it's still in the codebase. Anything that returns "not found" is provisionally DEAD.

// scripts/audit/pass1-existence.sh:1-20bash

SPEC="$1"

# Extract file paths
grep -oE '\`[a-zA-Z0-9_./-]+\.(ts|js|tsx|jsx|md|json|sh|py)\`' "$SPEC" \
    | tr -d '\`' | sort -u | while read path; do
        if [[ ! -e "$path" ]]; then
            echo "DEAD_PATH: $path"
        fi
    done

# Extract symbol references (function names in backticks followed by parens)
grep -oE '\`[a-zA-Z][a-zA-Z0-9_]+\(\)\`' "$SPEC" \
    | tr -d '\`()' | sort -u | while read sym; do
        if ! rg -q "(function|const|export)\s+$sym\b" --type ts; then
            echo "DEAD_SYMBOL: $sym"
        fi
    done

Twenty lines. Catches the cheap cases. Anything flagged here is unambiguous — the thing the spec describes literally isn't in the codebase anymore.

Pass 2: Behavior check. For each behavior claim in the spec ("returns 200 on success," "validates the user_id field," "rate-limits at 60 per minute"), find the corresponding code and read it. Confirm the behavior matches the spec or flag the divergence. This pass can't be automated. The classifier is human. The tooling helps by surfacing claim-code pairs efficiently.

Pass 3: Decision check. Every decision documented in the handoff (HANDOFF.md or equivalent) gets verified. Was the decision actually implemented? Did a later commit reverse it without updating the doc? This is where LYING claims tend to surface — decisions written down that never actually shipped.

// scripts/audit/pass3-decisions.sh:1-18bash

HANDOFF="$1"

grep -E '^- (DECISION|Decision \d+):' "$HANDOFF" | while read line; do
    DECISION_TEXT=$(echo "$line" | sed -E 's/^- (DECISION|Decision \d+):\s*//')
    SEARCH_TERM=$(echo "$DECISION_TEXT" | head -c 60 | tr -d '"' | tr ' ' '|')
    git log --all --since="60 days ago" --oneline | grep -iE "$SEARCH_TERM" || \
        echo "UNVERIFIED: $DECISION_TEXT"
done

Eighteen lines. Coarse but useful — surfaces decisions that have no corresponding commit history, suggesting they were either never implemented or implemented under different naming.

The Output Format

The audit produces a markdown report:

// audit-evidence/cycle-NN/REPORT.mdmarkdown

# Drift audit: plans/260305-content-pipeline-v2/
Audit date: 2026-05-04
Days since plan: 60

## Summary
- DEAD: 4 claims
- DRIFTED: 11 claims
- LYING: 2 claims
- FINE: 23 claims

## DEAD claims
1. `scripts/content-pipeline/scan.ts` — file no longer exists, replaced by `scripts/voice/detectors.py`
2. `validateOutput()` function — removed in commit 9a2f88c

## DRIFTED claims
1. "Stage 3 runs the AI-pattern detector" — Stage 3 now runs voice fingerprinting; AI-pattern moved to Stage 2
2. "Pipeline output is YAML" — output is now JSON, switched in commit 12c9aab

## LYING claims
1. "Pipeline supports parallel stage execution" — never implemented; sequential only
2. "Slack notification on stage failure" — was planned but skipped in v2

## Recommended actions
- Update Stage 3 description (DRIFTED #1)
- Remove parallel-execution claim or implement it (LYING #1)

The report is checked into the project as audit-evidence/cycle-NN/REPORT.md. The numbered cycles let me track audit history. The audit-evidence/ directory accumulates over time — a record of what drifted and when, useful for spotting which parts of the spec are most prone to lying.

What Gets Found

After three audit cycles across two projects, the patterns of drift cluster:

01Most common: Implementation detail drift (DRIFTED). A function that takes three arguments now takes four. A return type that was a string is now an object. About 60% of all drift findings.
02Second most common: Renamed but functionally equivalent (DRIFTED). A class called PipelineRunner got refactored to PipelineExecutor. Functionality identical; name changed. About 20% of findings.
03Third: Removed features (DEAD). Something that was in the spec at handoff time got cut from the implementation, but the spec never got updated. About 10% of findings.
04Fourth: Aspirational text that never shipped (LYING). The plan said something would happen, the work item didn't survive triage, the spec still says it. About 10% of findings.

DRIFTED findings are the cheapest to fix — most are one-line spec edits to update the new name or signature. DEAD findings are also cheap — delete the paragraph. LYING findings are the most embarrassing because they reveal aspirations that didn't get implemented; the right move is usually to either implement them now or admit they're not happening.

Why This Beats "Just Update Docs As You Go"

Every team I've worked with has the policy "update the docs as you change the code." Every team I've worked with also has docs that drift. The policy doesn't work because it depends on individual discipline applied consistently across every commit, which never actually happens.

The audit pattern works because it batches the discipline into a deliberate, scheduled activity. Once a quarter (or once per plan-cycle), you sit down with the spec and the code and reconcile them. The work is concentrated, the output is a clear delta, the cost is bounded.

The 60-day audit on the dated plan directories has saved me from shipping wrong information to my team three times in the past year. Each time, the spec still said something that the code had stopped doing months earlier. If a teammate had relied on the spec to make a decision, they would have been working from incorrect information. The audit caught it before that happened.

“Drift is inevitable. Audits are how you keep it from compounding. Sixty days is the threshold I've found works. The four-class taxonomy is what makes the output actionable. The whole pattern fits in a single afternoon every two months, and the alternative is shipping software whose docs lie about it.”