How is Claude Opus 4.8 priced versus Opus 4.7?

Base pricing is unchanged at a reported $5 per million input tokens and $25 per million output. The new fast mode runs up to 2.5x faster at a reported $10/$50 per million tokens, about a third the cost of the previous fast tier.

What changed for agentic coding?

Anthropic and early coverage put Opus 4.8 at 69.2% on SWE-Bench Pro (up from 64.3% on Opus 4.7), with better long-context handling, fewer compactions, improved tool triggering, and roughly 4x fewer unflagged flaws in self-generated code.

Where can I run it and what is the context limit?

It is available on the Claude Platform, AWS Bedrock, Google Vertex AI, and Microsoft Foundry, with a 1M-token context window by default (200k on Foundry), 128k max output, and adaptive-thinking-only controls.

Anthropic Ships Claude Opus 4.8 With Fast Mode and Parallel Subagent Workflows

Anthropic released Claude Opus 4.8 on May 28, holding base pricing flat against Opus 4.7 — a reported $5 per million input tokens and $25 per million output — while claiming the new model is roughly 4x less likely than Opus 4.7 to let flaws in its own generated code pass unflagged. For teams running long-horizon coding agents, that reliability delta, not a single headline benchmark, is the pitch.

Coding and agentic reliability lead the release

On agentic coding, Anthropic and early coverage put Opus 4.8 at 69.2% on SWE-Bench Pro, ahead of Opus 4.7 (64.3%), OpenAI's GPT-5.5 (58.6%), and Google's Gemini 3.1 Pro (54.2%). The model docs flag three concrete behavioral gains over 4.7: better long-context handling with fewer compactions and stronger compaction recovery, more reliable reasoning at each effort level, and better tool triggering — fewer cases of skipping a tool call the task required, a complaint some 4.7 users raised.

Anthropic is also leaning on an honesty angle, calling 4.8 its "most honest" model yet, with reduced unsupported claims and better uncertainty calibration. It says the model reaches new highs on prosocial traits and shows misaligned-behavior rates near those of its restricted Mythos model.

Fast mode, effort controls, and cache changes

Fast mode ships as a research preview on the Claude API: set speed: "fast" for up to 2.5x higher output tokens per second from the same model at premium pricing — reported at roughly $10/$50 per million tokens, about a third the cost of the previous fast tier. The effort parameter now defaults to high across the API and Claude Code, with selectable levels for cost and latency tradeoffs.

Opus 4.8 keeps the 1M-token context window by default on the Claude API, Bedrock, and Vertex AI (200k on Microsoft Foundry) and 128k max output. Adaptive thinking is the only thinking mode — temperature, top_p, top_k, and fixed thinking budgets still return 400 errors. The minimum cacheable prompt drops to 1,024 tokens, and new mid-conversation system messages let agentic loops append instructions without busting earlier prompt-cache hits.

Dynamic Workflows and the Mythos hold

A research-preview Dynamic Workflows feature in Claude Code lets the model plan a large task, spin up hundreds of parallel subagents, and verify outputs before reporting back — aimed at migrations touching hundreds of files. Opus 4.8 is live on the Claude Platform, AWS Bedrock, Google Vertex AI, and Microsoft Foundry.

Notably absent: Mythos, Anthropic's frontier system, which remains gated to a "handful of partners" on security grounds, with a broader release teased "in coming weeks." For builders, 4.8 is the practical upgrade today — flat token cost, cheaper fast inference, and fewer silent agent failures — while Anthropic keeps its strongest model in reserve as its IPO race with OpenAI intensifies.

Anthropic Ships Claude Opus 4.8 With Fast Mode and Parallel Subagent Workflows

Coding and agentic reliability lead the release

Fast mode, effort controls, and cache changes

Dynamic Workflows and the Mythos hold

More in Models

Microsoft's MAI-Image-2.5 Hits No. 3 on Arena, Level With Google's Nano Banana 2

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost