PART 32 · FIELD JOURNAL

20 min read read·2026-06-12

The 529 Cascade

Anthropic returns 529 (overload) intermittently. Naive retries themselves consume the bucket and storm the API a hundred times over. The retry policy that costs less, not more.

View companion repo

#AgenticDevelopment #ClaudeCode #Reliability #APIs

The first time I hit a 529 from Anthropic's API, I had three concurrent agent jobs running. All three got the same error code at roughly the same wall-clock moment. All three retried after one second. All three got 529 again. All three retried after two seconds. By the time the API was back to normal, I'd burned 142 retry attempts on three jobs, consumed about 712,000 tokens of redundant requests, and learned a lesson about retry storms that I've never had to learn twice.

Status code 529 is Anthropic's overload signal. It means the upstream is full and you should back off. The trap is that "back off" doesn't mean "wait and try again immediately." It means "back off in a way that doesn't make the problem worse," and the difference is the entire engineering content of this post.

Circuit-breaker budget · 5 consecutive failures = OPEN · 5/5

#01

✕ fail

529

#02

✕ fail

529

#03

✕ fail

529

#04

✕ fail

529

#05

✕ fail

529

5 consecutive 529s exhaust the budget and OPEN the circuit. Wait 30s. Probe in HALF_OPEN. If probe succeeds, CLOSE; if fails, OPEN again. Full jitter cuts mean recovery from 47s to 14s.

What 529 Actually Means

The HTTP 529 status code is a non-standard extension Anthropic uses for overload conditions on their own infrastructure — distinct from 429 (you specifically are over your quota), 503 (the upstream is down entirely), and 502 (something failed in transit). The semantic distinction matters because the retry strategy is different for each:

01429 — you specifically are rate-limited. Backing off helps you. The system at large is fine.
02502/503 — the system at large is broken. Backing off doesn't help anyone; you're waiting for a fix.
03529 — the system at large is overloaded. Backing off helps the system. Your retries are the load.

The third case is the one most retry libraries get wrong. They treat 529 like 429 — exponential backoff with jitter, retry up to N times, succeed eventually. That works for a single client. It fails catastrophically when many clients do the same thing simultaneously. The "back off" wave from each client doesn't disperse the load; it concentrates it at predictable intervals (second 1, second 2, second 4, second 8). Every client retries at exactly second 4. The API gets hammered again. Every client retries at second 8. Hammered again.

The Retry-Storm Math

Three concurrent jobs, each retrying with naive exponential backoff (1s, 2s, 4s, 8s, 16s, 32s, 60s cap), six retries each before giving up. That's 18 retries per overload event if all three jobs hit it simultaneously.

Now imagine 50 concurrent users on the same Anthropic API getting the same 529 at the same time. 50 × 6 = 300 retries, all hitting the API at the same predictable moments. The API stays in overload because every "recovery window" gets immediately consumed by the retry wave. The overload condition extends for minutes when it could have resolved in seconds.

The math gets worse when you account for the fact that overloaded sessions tend to be the long-context, expensive ones. Each retry isn't a 1KB request; it's potentially a 200KB request with a long conversation history. 300 retries × 200KB = 60MB of bandwidth consumed by redundant requests during a single overload event.

I instrumented my own retries and the 142-attempts-burned-712k-tokens number above is real, from a single session in the JSONL log. Three jobs, each averaging 47 retry attempts, with 5KB-token request bodies on the high end. The session cost on those jobs alone (most failed) was about $11.40, money spent on requests that mostly returned 529.

The Pattern That Works

Four components, working together:

01Exponential backoff with full jitter — randomize the retry interval to disperse the wave
02Circuit breaker — stop retrying entirely if the failure rate is too high
03Fallback model — if Opus is overloaded, try Sonnet; if Sonnet is overloaded, queue for later
04Hedged abandonment — if the request can be served by another path, don't retry; switch

The first three are well-known patterns. The fourth is the one most people don't implement. Here's the implementation:

// scripts/syndication/x/retry.ts:1-58typescript

interface RetryConfig {
  maxAttempts: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitter: "full" | "equal";
  fallbackModel?: string;
}

class CircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: "CLOSED" | "OPEN" | "HALF_OPEN" = "CLOSED";

  shouldAllow(): boolean {
    if (this.state === "CLOSED") return true;
    if (this.state === "OPEN" && Date.now() - this.lastFailure > 30_000) {
      this.state = "HALF_OPEN";
      return true;
    }
    return this.state === "HALF_OPEN";
  }

  recordFailure() {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= 5) this.state = "OPEN";
  }

  recordSuccess() {
    this.failures = 0;
    this.state = "CLOSED";
  }
}

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  config: RetryConfig,
  breaker: CircuitBreaker
): Promise<T> {
  let attempt = 0;
  while (attempt < config.maxAttempts) {
    if (!breaker.shouldAllow()) {
      throw new Error("Circuit breaker open; not attempting");
    }
    try {
      const result = await fn();
      breaker.recordSuccess();
      return result;
    } catch (e: any) {
      breaker.recordFailure();
      if (e.status !== 529 && e.status !== 503) throw e;
      attempt++;
      if (attempt >= config.maxAttempts) throw e;
      const baseDelay = Math.min(config.baseDelayMs * 2 ** attempt, config.maxDelayMs);
      const delay = config.jitter === "full" ? Math.random() * baseDelay : baseDelay / 2 + Math.random() * baseDelay / 2;
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error("Max attempts exceeded");
}

Fifty-eight lines. The circuit breaker tracks failures across calls, opens after five consecutive failures, lets one request through after 30 seconds (HALF_OPEN), and either re-opens or closes based on that probe's outcome. The retry function uses full jitter — Math.random() * baseDelay instead of baseDelay directly. That single change disperses the retry wave from a predictable spike to a uniform distribution, and it's the difference between making the overload worse and helping it resolve.

Full Jitter Beats Capped Exponential

The naive exponential backoff is delay = base * 2^attempt. The jittered version is delay = random(0, base * 2^attempt). The math:

BACKOFF INTERVALS · NAIVE VS FULL-JITTER

Attempt	Naive delay	Full-jitter range
1	1s	0-1s
2	2s	0-2s
3	4s	0-4s
4	8s	0-8s
5	16s	0-16s

With naive backoff, all clients retry at exactly second 1, then exactly second 3, then exactly second 7. Predictable spikes. With full jitter, the retries spread across [0,1], then [0,3], then [0,7]. The spikes flatten. The API sees a smooth distribution of recovery requests instead of three crushing waves.

The change is a single line of code. The effect on overload duration is dramatic. In my own measurements (small sample, but consistent), switching from no-jitter to full-jitter cut my mean recovery time from 47 seconds to 14 seconds during a 529 burst.

The Fallback Model Pattern

When Opus is overloaded, Sonnet usually isn't. The two models share infrastructure but the load characteristics differ — Opus is more expensive per call and gets used by fewer concurrent jobs, but those jobs are heavier and saturate it more easily. When you hit a 529 on Opus, switching to Sonnet for the next attempt often succeeds immediately:

// scripts/syndication/x/retry.ts:78-104typescript

async function callWithFallback(
  primaryModel: string,
  fallbackModel: string,
  request: any
): Promise<any> {
  try {
    return await callApi(primaryModel, request);
  } catch (e: any) {
    if (e.status === 529 || e.status === 503) {
      console.warn(`${primaryModel} overloaded, falling back to ${fallbackModel}`);
      return await callApi(fallbackModel, request);
    }
    throw e;
  }
}

The trade-off is quality. Sonnet's output on a hard task is worse than Opus's. For most use cases, "worse output now" beats "ideal output in 47 seconds when the storm clears." For some use cases (irreversible decisions, expensive downstream actions), waiting is correct. The fallback decision is per-task; my code lets each call site declare its own fallback policy.

The Hedged Abandonment Pattern

The pattern most people miss: if the work can be served by something other than the API, do that instead. The 529 response means "this path is congested." If you have an alternative path — a cached result, a different provider, a precomputed value — switch.

For my content pipeline, the syndication scheduler often retries because the API was busy. The hedged abandonment for that case: if the scheduled post can be queued for the next scheduling pass (typically within an hour), abandon the retry and re-queue. The pipeline doesn't care which pass actually publishes; it cares that the post eventually publishes within its window.

// scripts/syndication/supabase/retry.ts:14-34typescript

async function publishOrRequeue(post: Post): Promise<void> {
  try {
    await callApi("claude-sonnet-4-5", buildPostRequest(post));
    await markPublished(post.id);
  } catch (e: any) {
    if (e.status === 529 && post.deadline > Date.now() + 3_600_000) {
      // Plenty of time before deadline, requeue for next pass
      await requeueForNextPass(post.id);
      return;
    }
    if (e.status === 529) {
      // Tight deadline, fall back to lower-quality model
      await callApi("claude-haiku-4-5", buildPostRequest(post));
      await markPublished(post.id, { fallback: true });
      return;
    }
    throw e;
  }
}

Twenty-one lines. The decision: if the deadline is far enough away, requeue (free, no retry burden on the API). If the deadline is tight, fall back to Haiku (lower cost, lower quality, but ships on time). Only escalate to a hard error if neither path works.

What I Run In Production

The api-limit-recovery package is what I install on every project that calls the Anthropic API at any scale. It implements all four patterns: jittered backoff, circuit breaker, fallback model, hedged abandonment. The defaults are conservative — five-attempt cap, 60s max delay, full jitter, 30s circuit-breaker hold time. Override per call site if needed.

After eight months of running this in production across the syndication pipeline, the multi-agent jobs, and a few experiments, I've never had a retry storm originate from my own infrastructure. The 529s I do see resolve cleanly because my retries don't add to the load — they disperse and back off, the circuit breaker opens when needed, the fallback paths absorb the overflow.

“The 712,000 tokens I burned on that first storm was an expensive lesson. Five hundred twenty-nine is the protocol's way of asking for cooperation: backoff, jitter, retry — not a hard refusal. Treating it that way is what keeps the cost of retries below the cost of the original calls.”