Back to stories
Models

Anthropic Ships Claude Opus 4.8 With Fast Mode and Parallel Subagent Workflows

Michael Ouroumis2 min read
Anthropic Ships Claude Opus 4.8 With Fast Mode and Parallel Subagent Workflows

Anthropic released Claude Opus 4.8 on May 28, holding base pricing flat against Opus 4.7 — a reported $5 per million input tokens and $25 per million output — while claiming the new model is roughly 4x less likely than Opus 4.7 to let flaws in its own generated code pass unflagged. For teams running long-horizon coding agents, that reliability delta, not a single headline benchmark, is the pitch.

Coding and agentic reliability lead the release

On agentic coding, Anthropic and early coverage put Opus 4.8 at 69.2% on SWE-Bench Pro, ahead of Opus 4.7 (64.3%), OpenAI's GPT-5.5 (58.6%), and Google's Gemini 3.1 Pro (54.2%). The model docs flag three concrete behavioral gains over 4.7: better long-context handling with fewer compactions and stronger compaction recovery, more reliable reasoning at each effort level, and better tool triggering — fewer cases of skipping a tool call the task required, a complaint some 4.7 users raised.

Anthropic is also leaning on an honesty angle, calling 4.8 its "most honest" model yet, with reduced unsupported claims and better uncertainty calibration. It says the model reaches new highs on prosocial traits and shows misaligned-behavior rates near those of its restricted Mythos model.

Fast mode, effort controls, and cache changes

Fast mode ships as a research preview on the Claude API: set speed: "fast" for up to 2.5x higher output tokens per second from the same model at premium pricing — reported at roughly $10/$50 per million tokens, about a third the cost of the previous fast tier. The effort parameter now defaults to high across the API and Claude Code, with selectable levels for cost and latency tradeoffs.

Opus 4.8 keeps the 1M-token context window by default on the Claude API, Bedrock, and Vertex AI (200k on Microsoft Foundry) and 128k max output. Adaptive thinking is the only thinking mode — temperature, top_p, top_k, and fixed thinking budgets still return 400 errors. The minimum cacheable prompt drops to 1,024 tokens, and new mid-conversation system messages let agentic loops append instructions without busting earlier prompt-cache hits.

Dynamic Workflows and the Mythos hold

A research-preview Dynamic Workflows feature in Claude Code lets the model plan a large task, spin up hundreds of parallel subagents, and verify outputs before reporting back — aimed at migrations touching hundreds of files. Opus 4.8 is live on the Claude Platform, AWS Bedrock, Google Vertex AI, and Microsoft Foundry.

Notably absent: Mythos, Anthropic's frontier system, which remains gated to a "handful of partners" on security grounds, with a broader release teased "in coming weeks." For builders, 4.8 is the practical upgrade today — flat token cost, cheaper fast inference, and fewer silent agent failures — while Anthropic keeps its strongest model in reserve as its IPO race with OpenAI intensifies.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Microsoft's MAI-Image-2.5 Hits No. 3 on Arena, Level With Google's Nano Banana 2
Models

Microsoft's MAI-Image-2.5 Hits No. 3 on Arena, Level With Google's Nano Banana 2

Microsoft's in-house MAI-Image-2.5 debuted at No. 3 on LMArena's text-to-image leaderboard, matching Google's Nano Banana 2 and trailing only OpenAI's Image-2 — and it's headed to Microsoft Foundry within two weeks.

1 day ago2 min read
Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API
Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

At I/O 2026 Google shipped Gemini 3.5 Flash, a Flash-tier model that outscores Gemini 3.1 Pro on coding and agentic benchmarks at less than half the cost of comparable frontier models, alongside a Managed Agents API that spins up tool-using, code-executing agents in a single call.

1 week ago2 min read
Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost
Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Gemini 3.2 Flash debuts at $0.25/M input and $2.00/M output tokens, hitting ~92% of GPT-5.5 on coding and reasoning while rolling out across Search, Maps, Gmail, and Chrome simultaneously.

1 week ago2 min read