What 11.6GB of session data across 27 projects reveals about building production software with AI agents
Why multi-agent consensus catches what solo review cannot
Why functional validation replaced testing when AI writes the code
Token-by-token Claude streaming on iOS — after four failed architectures
Five battle-tested Swift patterns from building three iOS apps with AI agents — covering state management, memory profiling, iCloud sync, Keychain security, and multi-simulator validation
Git worktrees give each AI agent its own filesystem — a five-stage pipeline makes sure they all come back together
How layered enforcement turned written rules into mechanical discipline across 23,479 AI coding sessions
One hat per session, one event per hat, and the 1:47 AM guidance command that finished 28 tasks by morning
Building a pipeline to extract patterns from 11.6GB of Claude Code session data — and discovering the series was using the wrong numbers
269 AI-generated screens, zero Figma files, and the branding bug that taught me to treat prompts as build artifacts
A single YAML file replaces meetings, tickets, and Slack threads — agents read it, build in parallel, and ship without asking clarifying questions
A SQLite observation store and MCP memory server that turns 23,479 sessions of amnesia into searchable institutional knowledge
How structured hypothesis-test-revise chains solve bugs that brute force debugging never will
Parallelism gets agents working at the same time. Choreography is what keeps their work from eating itself alive when they finish.
How SKILL.md files turn repeatable workflows into invocable prompt programs — and the factory that generates them
Hooks, skills, and the enforcement layer that turns agent suggestions into hard stops across 23,479 sessions
Four generations of Claude Code builders — each one a lesson that cost real money and real patience to learn
When to call the API directly, when to use Claude Code CLI, and when you need both — a practical guide from the series finale
Linear, tournament, and convergence pipelines that turn rough ideas into executable plans
The gate between 'I did the work' and 'the work is done' — 10 phases, 3 reviewers, 3 oracles, zero override flags.
Token spend per task complexity, when Opus actually pays back over Sonnet, and the cost-of-defect curve that makes "use the cheap model" the most expensive choice you can make
Four hook events form the entire governance surface of Claude Code — refuse tools, gate commits, block deploys, enforce evidence. Once you see it as a control plane, the whole agent stack changes shape.
Cache reads cost a tenth what cache writes cost, and most agents leave that 90% discount on the table because nobody structures their system prompt for hits. Here's how to order your messages so the cache pays you back.
When the prebuilt MCP servers run out of road, you write your own. The protocol is a four-method handshake, the transport is stdio or HTTP, and the whole thing fits in 200 lines of TypeScript.
AGENTS.md, SKILL.md, CLAUDE.md — three doc formats nobody asked for, written for an audience that doesn't read narrative. The 95:5 ratio between human-targeted and agent-targeted docs is about to flip.
The pattern: secrets enter the agent's environment at the tool boundary, not in the prompt. Vault-injected env. 1Password op-run envelopes. Hooks that scrub before write. Your session JSONL gets archived; nothing in there should be sensitive.
The JSONL session file is the primary deliverable, not the resulting code. Replayable, line-addressable, auditable — 11.6 GB across 538 directories of permanent record telling you exactly how every commit got written.
Six banned phrases, three syntactic patterns, one cosine-similarity fingerprint, and a humanize loop that rewrites flagged passages until the voice matches. The gate every post in this series passes through.
Seventy-one marketplaces subscribed. One hundred thirty-two plugins installed. Twenty-six marketplaces installed-but-unused. The discovery overhead is the bottleneck — and it's getting worse.
Agents finish at three in the morning. Severity model, channel routing, the exhausted-state-fires-alert pattern. The notification layer that lets you sleep while your work runs.
Specs and code diverge silently. The 60-day audit pattern, the four-class taxonomy (dead, drifted, lying, fine), and what I do every quarter to make sure my docs still describe what my code actually does.
Anthropic returns 529 (overload) intermittently. Naive retries themselves consume the bucket and storm the API a hundred times over. The retry policy that costs less, not more.