PART 21 · FIELD JOURNAL

19 min read read·2026-05-05

The Economics of a Session

Token spend per task complexity, when Opus actually pays back over Sonnet, and the cost-of-defect curve that makes "use the cheap model" the most expensive choice you can make

View companion repo

#AgenticDevelopment #ClaudeCode #Economics #PromptEngineering

The first time I looked at the cost ledger for a single project, I assumed it was a bug. One feature, three days of work, $47 in API spend. Then I looked at the next project. $89. Then a third — $112 for what should have been a one-afternoon refactor.

Across 23,479 sessions in ~/.claude/projects/, the line that kept showing up in the JSONL was the same: model: claude-opus-4-5, cache_creation_input_tokens: 41,228, cache_read_input_tokens: 0. Forty-one thousand tokens of cache miss, every single turn.

That's the actual shape of the problem. People talk about 'the cost of running Claude Code' like it's a function of how much you use it. It isn't. It's a function of which model you picked, what you cached, and whether you let the planner think before the executor moved.

What a Session Actually Costs

The Anthropic API ships usage metadata in every assistant response. If you tail a .jsonl file in ~/.claude/projects/<project>/<session>.jsonl you see this on every turn:

// ~/.claude/projects/-Users-nick-Desktop-blog-series/<session>.jsonljson

{
  "type": "assistant",
  "message": {
    "model": "claude-opus-4-5",
    "usage": {
      "input_tokens": 4,
      "cache_creation_input_tokens": 41228,
      "cache_read_input_tokens": 0,
      "output_tokens": 387
    }
  }
}

Four fields, four price points. input_tokens are the most expensive (full base rate). cache_creation_input_tokens cost 1.25x base for the privilege of being cached for the next hour. cache_read_input_tokens cost 0.1x base — a 90% discount. output_tokens are typically 5x input.

Multiply that by Opus's posted rates and a single turn that reads 41k tokens of cache-miss context costs about $0.77. Stack 60 turns into a deep refactor and you're looking at $46 just on input tokens alone, before output. Now do that for every session in a project. Now do it across 23 projects.

The session-insight-miner I built specifically to walk these JSONL files exposes the distribution. The numbers in this section are from that miner across the 23,479-session corpus:

SESSION INPUT-TOKEN DISTRIBUTION (23,479 SESSIONS)

Percentile	Input tokens	Notes
Median	184k	A typical day's session
P95	2.1M	Deep refactor or extended audit
P99	8.7M	Full-codebase rewrite territory
Max	14.3M (312 turns)	Single session that cost more than a monthly Vercel bill

The geometric scale between median and tail is the entire story. A median session is cheap. The long tail is where the bills come from.

The Cost-of-Defect Curve

Here's the mistake I made for two solid months: I used Sonnet for planning because Opus felt expensive. The math seemed obvious. Sonnet was about a fifth the price per token. So plan with Sonnet, execute with Sonnet, save four-fifths.

What actually happened: Sonnet would draft a plan that missed an obvious constraint. The executor (also Sonnet) would faithfully implement the missed-constraint plan. The validator would fail. I'd go back, ask for a revision, watch Sonnet patch around the gap with three more turns of edits. Then the second-order problem would surface — the patch broke something else — and I'd burn six more turns chasing the cascade.

Tally those turns: a 12-turn debug spiral on Sonnet at ~120k input tokens per turn averages out to about $4.30. The original 'expensive' Opus plan would have been one turn at maybe $0.90. The cost-of-defect curve is real, and it bends viciously.

I started measuring this directly. I'd take the same task, plan it twice — once with Opus, once with Sonnet — then execute both plans through identical Sonnet executors. The Opus-planned variants finished in a median of 7 turns. The Sonnet-planned ones took 19. Even with Opus costing roughly 5x Sonnet on planning, the total session cost was lower with Opus 73% of the time.

PLAN-THEN-EXECUTE COST COMPARISON (SAME TASK, IDENTICAL EXECUTOR)

Planner	Plan cost	Executor turns	Patch turns	Total cost
Opus	1 turn ~ $0.90	7 turns ~ $0.70	0	~ $1.60
Sonnet	1 turn ~ $0.20	19 turns ~ $1.90	12 turns ~ $3.40 (defect cascade)	~ $5.50

“Pay for thinking, not typing. Opus is the thinking budget. Sonnet is the typing budget.”

If you blur the two, you end up paying typing prices to debug thinking failures, and the bill is much higher than just buying the thinking outright.

When Sonnet Actually Wins

Sonnet wins when the task is shallow. Shallow doesn't mean small. It means the constraints are obvious from the prompt, the codebase is well-understood by the executor, and the search space is narrow.

Concrete examples from the JSONL corpus where Sonnet beat Opus on total cost:

01Renaming a function across 14 files (Sonnet 3 turns at ~$0.18 vs Opus 3 turns at ~$0.81)
02Adding logging statements to an existing handler (Sonnet 1 turn vs Opus 1 turn — same number of turns, Opus 5x the bill)
03Translating Python to TypeScript line-by-line for a 200-line file (Sonnet 4 turns vs Opus 4 turns)
04Generating boilerplate React components from a JSON schema (Sonnet, not even close)

What these tasks share: there's no architectural decision to make. You're not choosing between approaches. You're transcribing intent into syntax. That's a typing job. Sonnet is good at typing.

Where Opus reliably wins, even on raw cost:

01Anything involving a state machine with more than four states
02Refactors that touch more than two layers of the application
03Performance work where you have to reason about cache locality, concurrency, or I/O patterns
04Anything where the user said 'I don't know how to fix this'
05Cross-module bug hunts where the root cause is two layers deep

The signal I learned to watch: how often the plan changes mid-execution. If your plan survives contact with the codebase, your planner was good enough. If the executor sends back 'this won't work because...' three times in a row, you under-paid for the planner. Pay more next time.

The Cache Question

Cache is the other lever, and it's the one most people leave alone because it's invisible from the outside. The Claude Code system prompt is roughly 9,300 tokens before any project context loads. Your CLAUDE.md adds another 2-6k. Skills loaded by trigger add 200-2,000 each. Project files referenced in conversation add whatever they add.

The naive default writes all of that to the cache once and then re-reads from cache for the rest of the session. Cache reads cost 0.1x base. The first turn pays the 1.25x cache-creation surcharge, every turn after pays the 0.1x discount, and you're up roughly 88% over the no-cache baseline.

That's the happy path. Here's the unhappy one. Every time the cached prefix changes, the cache invalidates and the next turn pays full creation cost again. If you hand-edit your CLAUDE.md mid-session, you've just bought a fresh cache. If a tool call returns a result that gets injected into the system context, sometimes — not always — you've bought a fresh cache. If you /clear, you've bought a fresh cache.

I started instrumenting the JSONL files for cache-miss patterns. Across 200 sessions sampled at random, 43% had at least one mid-session cache invalidation that wasn't user-triggered. The biggest culprit: skills that load mid-session and inject content into the system prompt area. Once a skill loads, the next turn often shows cache_creation_input_tokens ticking up by however many tokens the skill body added.

CACHE COST RATIOS

Operation	Multiplier vs base input	Effect over 60 turns
No cache (every turn full price)	1.0x	Baseline
Cache create (first turn)	1.25x	One-time surcharge
Cache read (subsequent turns)	0.1x	Up to 88% savings

The expensive sessions in the long tail almost always have multiple cache invalidations. The cheap sessions have one creation event followed by 60 reads. Same number of tokens flowing through the model, an order of magnitude difference in cost.

The pattern I settled on: pre-load the skills you'll need at session start, even if you don't call them. The cache pays for itself across 5+ reads. Loading a 1,500-token skill once costs you about 1,875 cache-creation tokens and saves you 9,000 cache-read tokens per turn for the remaining 5 turns. Net savings show up immediately.

The Plan-Then-Execute Math

The cheapest sessions I have on record share a structure. One Opus turn at the top to draft a plan. The plan goes into a markdown file in the working directory. Then a Sonnet executor reads the plan and executes it across 4-12 turns. Total cost: one expensive turn plus several cheap ones.

The pricing math:

// scripts/cost-model.py · approximate per-turn cost in USDpython

# from: scripts/cost-model.py:14-32
COST = {
    "opus": {
        "input": 15.00 / 1_000_000,
        "cache_create": 18.75 / 1_000_000,
        "cache_read": 1.50 / 1_000_000,
        "output": 75.00 / 1_000_000,
    },
    "sonnet": {
        "input": 3.00 / 1_000_000,
        "cache_create": 3.75 / 1_000_000,
        "cache_read": 0.30 / 1_000_000,
        "output": 15.00 / 1_000_000,
    },
}

def turn_cost(model: str, usage: dict) -> float:
    rates = COST[model]
    return (
        usage["input_tokens"] * rates["input"]
        + usage["cache_creation_input_tokens"] * rates["cache_create"]
        + usage["cache_read_input_tokens"] * rates["cache_read"]
        + usage["output_tokens"] * rates["output"]
    )

Run that against a real session and the picture clarifies. A 60-turn Sonnet-only session that did everything (plan and execute) on Sonnet: ~$8.40. A 7-turn Opus-plan + 12-turn Sonnet-execute version of the same task: ~$3.10. Less than half. Better outcome. Faster wall clock.

THREE REAL-SESSION COST PROFILES

Strategy	Turns	Total cost	Outcome
Sonnet-only (plan + execute)	60	~ $8.40	Worked, but with cascading patches
Opus-plan + Sonnet-execute	7 + 12	~ $3.10	Cleanest result; less than half the bill
Opus across the entire session	312	~ $94	Most expensive in corpus; Sonnet would have done it for ~$19

Chart · animated barsThree real-session cost profiles · USD per session

10,550 total

Opus plan + Sonnet exec

0

Sonnet-only (cascade)

0

All-Opus (312 turns)

0

Cents per session. Pay for thinking once, type with Sonnet. All-Opus is the path that always loses.

The version that always loses: Opus across the entire session. It's the model that should be doing the thinking, not the typing, and using it for typing is just lighting money on fire. The single most expensive session in my corpus was 312 Opus turns. About $94. Sonnet would have done the same work for about $19.

Where the dollars go · all-Opus 312-turn session (~$94) · total 94

Output tokens (5× input)50%

Cache-create surcharge (1.25×)23%

Fresh input (1.0×)19%

Cache reads (0.1×)7%

Output dwarfs everything when the model is doing the typing. Sonnet-execute would have collapsed the bar to ~$19.

What I Watch Now

Three numbers, on a dashboard, refreshed daily:

THE THREE-NUMBER DASHBOARD

Metric	Trigger threshold	What it tells you
Median tokens per turn	> 60k	Context is bloated; clear something
Cache-read ratio	< 80%	Invalidating cache too often (likely a mid-session skill load)
Cost-of-defect (turns past plan completion / total turns)	> 0.4	Planner under-thought; pay more for thinking next time

Those three numbers tell me whether to switch models, restructure my CLAUDE.md, or just accept that the task is genuinely hard and pay for thinking. The bill at the end of the month is whatever it is. The point isn't to minimize spend. The point is to know which dollar bought what.

“The economics of a session aren't mysterious. They're in the JSONL, on every turn, in fields that nobody reads because they're at the bottom of the message envelope.”

Read them. The numbers will surprise you. Then they'll change how you work.