Does Claude Opus 4.8 cost more than 4.7?

No. Input is $5 per million tokens and output is $25 per million tokens, the same pricing as Opus 4.7.

What is the Claude Opus 4.8 model id?

The API model id is claude-opus-4-8.

Should I switch from Claude Opus 4.7 to 4.8?

For most workloads yes, because the price is identical and the benchmark scores are higher. Pin the model id and test on your own workload before moving production traffic.

What are dynamic workflows in Claude Opus 4.8?

Dynamic workflows is a research-preview feature in Claude Code that runs tens to hundreds of parallel subagents with resumable state for multi-day jobs. It consumes far more tokens than a normal session.

All writing

May 28, 20266 min read

Claude Opus 4.8: What Actually Changed for AI Agent Builders

Claude Opus 4.8 dropped May 28, 2026. Same price as 4.7, higher SWE-bench scores, and a model that flags its own mistakes. Here is what actually changed if you build AI agents.

#AI Agents #Claude #Anthropic #Agentic Coding #Automation

Share LinkedIn

Anthropic shipped Claude Opus 4.8 today, May 28, 2026. That is less than two months after 4.7. The upgrade pace is picking up.

If you build AI agents for a living, the headline is not the benchmark jump. It is that the model is better at admitting when it got something wrong. For agent builders, that matters more than another point on a coding test.

Here is what actually changed and what to do about it.

What shipped

Model id: claude-opus-4-8
Price is unchanged from 4.7: $5 per million input tokens, $25 per million output. Prompt caching still cuts up to 90 percent off input cost. Batch processing cuts 50 percent.
1M token context window.
Fast mode is about 2.5 times quicker than before, at the same quality.
A new feature called dynamic workflows is in research preview inside Claude Code.

The price staying flat is the quiet headline. You get higher scores for the same bill.

The benchmarks that matter

Opus 4.8 beats 4.7 across the board. The coding and long-context numbers are the ones agent builders should care about.

Benchmark	Opus 4.7	Opus 4.8
SWE-bench Pro (agentic coding)	64.3%	69.2%
SWE-bench Verified	87.6%	88.6%
USAMO 2026 (math)	69.3%	96.7%
GraphWalks (1M context recall)	40.3%	68.1%

Against the other frontier models on agentic coding, it leads:

Model	SWE-bench Pro
Claude Opus 4.8	69.2%
GPT-5.5	58.6%
Gemini 3.1 Pro	54.2%

Benchmarks are a starting point, not a promise. Test on your own workload before you trust the numbers. I have seen plenty of agents that ace public tests and still fall over in production. More on that in why AI agent pilots fail.

The real change: it flags its own mistakes

The number Anthropic is proudest of is not on a coding leaderboard. Opus 4.8 is about four times less likely than 4.7 to let a flaw in its own code pass without flagging it. Anthropic is calling it their most honest model yet.

For anyone shipping agents, this is the whole game. An agent that quietly writes broken code is worse than one that stops and says it is not sure. The first one costs you a production incident. The second one costs you thirty seconds of review.

A model that surfaces its own uncertainty is easier to build guardrails around. You can route low-confidence steps to a human instead of letting them run.

Dynamic workflows, and the token bill that comes with them

Dynamic workflows let Claude Code take on bigger jobs. It can spin up tens to hundreds of parallel subagents and keep resumable state across multiple days. It is in research preview on Max, Team, and Enterprise plans, and through the API, Bedrock, Vertex, and Microsoft Foundry.

One caveat that is easy to miss: it asks for confirmation before it runs, and it burns far more tokens than a normal Claude Code session. Parallel subagents multiply your spend fast. If you turn this on, watch the meter. This is exactly the kind of feature that produces a surprise invoice. See the real cost of AI agents for how those bills add up.

Should you switch from 4.7?

For most people, yes. The downside is low because the price is identical.

A few practical notes:

Pin the model id so a future default change does not move under you.

client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=conversation,
)

Keep prompt caching on. It still cuts up to 90 percent off input cost, which is most of an agent's bill.
Run your own evals before you flip production traffic. A higher public score does not guarantee a win on your task.
If you are weighing a frontier model against a local one for cost reasons, that tradeoff has not changed much. See running local LLMs in production.

What this means if you ship agents

The trend from 4.7 to 4.8 is clear. Frontier models keep getting better at coding and more honest about their limits. Neither of those removes your job. You still have to wire budgets, rate limits, and failure handling around the model.

A more honest model flags more problems. It does not stop spend. It will not cap your token usage when a dynamic workflow goes sideways at 2 a.m. You still need something watching the money in production. That is the gap AgentGuard fills.

FAQ

Does Opus 4.8 cost more than 4.7? No. Input is $5 per million tokens and output is $25 per million, the same as 4.7.

What is the model id? claude-opus-4-8.

Should I switch from 4.7? Most likely yes. The price is identical and the scores are higher. Pin the id and test on your own workload first.

What are dynamic workflows? A research-preview feature in Claude Code that runs many parallel subagents with resumable state for multi-day jobs. It uses far more tokens than a normal session, so watch your spend.

Building agents on Opus 4.8? Put a budget and a rate limit around them before they surprise you with a bill. AgentGuard is a free, open-source runtime guardrail that caps token spend and request rate for AI agents. Get AgentGuard.

FAQ

Should AI agent builders switch to Claude Opus 4.8?

Only after testing your own workflows. Compare output quality, latency, cost per run, tool behavior, and failure recovery before switching.

What matters more than a model benchmark?

For agents, the full run matters: retries, tool calls, context size, latency, and whether failures stop cleanly with usable logs.

Get the local AI lab notes

Benchmark rows, VRAM fit checks, quant choices, and what actually runs on consumer GPUs. M-F, only when there is something worth sending.

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.

Claude Opus 4.8: What Actually Changed for AI Agent Builders

What shipped

The benchmarks that matter

The real change: it flags its own mistakes

Dynamic workflows, and the token bill that comes with them

Should you switch from 4.7?

What this means if you ship agents

FAQ

FAQ

Should AI agent builders switch to Claude Opus 4.8?

What matters more than a model benchmark?

Get the local AI lab notes

More writing

AI Jobs vs Entry-Level Work: A Reality Check for Builders

A self-healing system can't heal an empty queue

What Anthropic's MITRE ATT&CK report means for solo AI builders

Agentic coding moved my bottleneck to code review

Anthropic's IPO and the 40% Cost-Savings Gap: Why Your Spend Cap Matters More Now