April 16, 2026Updated July 2, 20265 min read

Claude Code Prompt Caching Costs: Pick the Right TTL (2026)

Claude Code has two prompt-caching TTLs, and most devs pay the pricier tier without knowing. Here's how cache writes quietly inflate your Anthropic bill — and how to cap it.

#Claude Code #AI Costs #Prompt Caching #AgentGuard #2026

Share LinkedIn

Claude Code's prompt caching has two TTLs (5 minutes and 1 hour), two price tiers (25% and 100% premium on cache writes), and most people have no idea which one they're paying for. The Register reported this week that users are hitting unexpected bills and rate limits. If you run Claude Code in production, you need to read this.

The Register ran a piece this week on Claude Code cache confusion. Short version: users are paying more than they expected, performance is inconsistent, and nobody has a clear mental model for when cache writes get charged at 25% versus 100% over base input tokens.

I've been watching my own Claude Code bill closely. Here's what's happening and what to do about it.

The two TTLs

Claude's prompt caching has two expiration windows:

5-minute (ephemeral): the default. Cache writes cost 1.25x base input tokens. Reads cost 0.1x.
1-hour: opt-in for longer sessions. Cache writes cost 2x base input tokens. Reads are still 0.1x.

The minimum cacheable block is around 1024 tokens. You get up to 4 cache breakpoints per request.

Tight back-and-forth inside a 5-minute window? The 5-min cache is fine. You write once, read many, amortize the cost.

Coding for an hour, stepping away for coffee, coming back? The 5-min cache expired while you were gone. Next request pays the full write again.

Nobody tells you which mode your session is in at any given moment.

What a day of coding actually costs

Rough math. Your Claude Code session has:

40-50k tokens of tools, CLAUDE.md, and scaffolding that gets cached
Some currently-open files (another 5-20k)
1-3k tokens per turn of conversation

With 5-min caching over a typical 2-hour session where you idle 4 times (coffee, Slack, lunch, thinking):

First turn: 1.25x on ~50k cached tokens. That's 12.5k tokens of write premium on top of the base cost.
Reads inside the 5-min window: about $0.02 per turn at Sonnet rates.
Each idle longer than 5 minutes triggers another rewrite. Another 12.5k in premium.

Four rewrites in a day = around 50k tokens of pure cache-write premium. At current Sonnet input rates (~$3 per million), that's about $0.15/day per active session in overhead you weren't tracking.

Sounds small. Scale it. Five sessions a day, 20 workdays a month. That's $15/month in cache-write overhead alone. More if your CLAUDE.md is large or you have a big tool set.

Per-token pricing that shifted this month makes this worse. You no longer have a flat subscription to absorb the variance.

Why this is worse than it sounds

Three things compound:

Cache writes on every fresh session. Every claude cold-start rewrites the full system prompt. 1.25x on ~40k tokens. At least $0.15 just to start each session.

The 1024-token floor. Blocks under 1024 tokens don't cache at all. You pay full input price every time. Small files and short tool definitions silently fall off the cache path.

The 1-hour option is not automatic. You need to pass cache_control: { type: "ephemeral", ttl: "1h" } in the API call. Claude Code the CLI handles this in most paths, but if you're driving the SDK directly, you're on 5-min caching until you opt in.

The Register flagged the bigger issue: the billing surface is opaque. You see the total on your invoice. You don't see cache-write vs cache-read vs base-input split out the way you'd split out the code that caused them.

Per-token pricing makes it a real problem

Anthropic shifted to per-token billing on several tiers this month. Max-plan-style flat absorption is gone on those paths. Every cache miss hits your card.

Per-token plus opaque cache costs = your bill is variable AND unpredictable.

Variable is fine if it's predictable. Unpredictable is fine if it's bounded. Both at once is the exact pattern that sinks AI-dev-tool budgets.

How to control it

Three things, in order:

1. Set a daily token budget, not a monthly one. You want the alert Monday afternoon, not Friday morning when the damage is done.

2. Close idle sessions. A 2-hour tmux-backgrounded Claude Code window is burning cache rewrites on every follow-up. Open a fresh one instead of reviving a stale one.

3. Put a hard cap on every agent process you spawn. If you script Claude Code via claude --headless or the SDK, don't trust the process to self-regulate. Cap the runtime at the guardrail layer, below the model.

Number three is the one most people skip and most often regret.

What I use

I built AgentGuard for exactly this. Runtime cost and safety guardrails for any agent you spawn, including Claude Code via SDK. Set a dollar budget, a max number of turns, a timeout, and a tool-call rate limit. The agent stops when it crosses a line. You find out at the cap, not on the next invoice.

pip install agentguard47. MIT license, OSS. Pro dashboard is $39/mo if you want alerting and historical spend charts.

The Register article is a warning shot. Prompt caching is a sharp tool. Use it wrong and it compounds. Use it right with a hard runtime cap and it's fine.

Don't wait for your next invoice to find out which one you're doing.

Set a hard budget on your AI agents with AgentGuard →

FAQ

Does Claude Code caching reduce cost?

It can reduce repeated context cost when tasks reuse the same prompt material. Savings depend on cache hits, model choice, and how often agents retry.

What should teams measure for Claude Code cost?

Measure cache hits, prompt size, retries, run length, and stop reasons. Those numbers explain cost better than a simple seat price.

Get the Local AI Field Kit

Four copy-ready tools now, then measured local AI field notes M-F only when there is something worth sending.

Free. One-click unsubscribe. No sponsored placements. Your email is used only for these notes.

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.

Claude Code Prompt Caching Costs: Pick the Right TTL (2026)

The two TTLs

What a day of coding actually costs

Why this is worse than it sounds

Per-token pricing makes it a real problem

How to control it

What I use

FAQ

Does Claude Code caching reduce cost?

What should teams measure for Claude Code cost?

Get the Local AI Field Kit

More writing

GPU Prices Up 48% in Two Months. I Run LLMs in My Garage.

What Uber's $1,500/Developer AI Cap Tells You About Your Own Bill

Microsoft Told Engineers to Ease Off Claude Code

AI security is now a token-burning contest. Who's watching the bill?

The flat-fee era is over. How to control your AI agent costs in 2026.