Back to stories
Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

Michael Ouroumis2 min read
Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

Google used I/O 2026 to ship Gemini 3.5 Flash, a Flash-tier model that now outscores the previous Pro tier on the benchmarks practitioners actually run agents against — at less than half the cost of comparable frontier models and roughly 4x the output throughput. Google is making it the new default model across the Gemini app, Search, and the API, with rollout starting the day of the keynote.

A Flash model that outruns the Pro tier

Google reports 3.5 Flash beating Gemini 3.1 Pro on the coding and agentic suites that matter for production: Terminal-Bench 2.1 at 76.2%, GDPval-AA at 1656 Elo, and MCP Atlas at 83.6%, plus 84.2% on CharXiv Reasoning for multimodal understanding. The company calls it its "strongest agentic and coding model yet" and pegs throughput at roughly 4x faster output tokens per second than comparable frontier models. Gemini 3.5 Pro is still in testing and is slated for next month.

The headline for builders is the inversion: a Flash-tier model leading the prior Pro tier on agent benchmarks collapses the usual quality-versus-cost tradeoff for high-volume, tool-calling workloads.

The pricing math has a caveat

List pricing is $1.50 per million input tokens and $9.00 per million output tokens. Read the cost claim carefully: the "less than half the cost" pitch is benchmarked against competing frontier-tier models, not Google's own previous Flash — on a strict per-token basis, 3.5 Flash is reportedly pricier than the older Flash. The savings are in capability-per-dollar on agentic tasks, not raw token price.

Google also restructured AI Ultra: a new $100/month entry point, with the former $250 plan now $200/month. Consumer usage moves to a "compute-used" model tied to prompt complexity, with limits refreshing every five hours up to a weekly maximum.

Managed Agents API removes the sandbox plumbing

The more consequential developer release is the Managed Agents API. A single call spins up an agent that reasons, calls tools, and executes code inside an isolated, Google-hosted Linux environment — abstracting away the agent-state and sandbox infrastructure teams currently self-host. It is exposed via the Interactions API and in AI Studio at ai.dev/managed-agents.

Around it, Google added CodeMender, an AI code-security agent for vulnerability detection and remediation, an AI Content Detection API, and Antigravity 2.0, a standalone desktop app and CLI for agent orchestration. Gemini Omni Flash — the any-input, video-output multimodal model — is rolling out to developers and enterprise customers via the Gemini API and Agent Platform API in the coming weeks.

What changes for builders

For anyone running agents at scale, the combination is the story: Pro-tier agentic quality at Flash-tier economics, plus managed execution environments that eliminate a chunk of self-hosted infrastructure. The competitive pressure on per-task inference cost — and on rival agent platforms — just tightened.

— Michael Ouroumis

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost
Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Gemini 3.2 Flash debuts at $0.25/M input and $2.00/M output tokens, hitting ~92% of GPT-5.5 on coding and reasoning while rolling out across Search, Maps, Gmail, and Chrome simultaneously.

1 day ago2 min read
Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI
Models

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

Mira Murati's Thinking Machines Lab released a research preview of 'interaction models,' a new class of full-duplex multimodal AI that listens, sees and speaks at the same time, with turn-taking latency reported at about 0.4 seconds.

1 week ago2 min read
OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning
Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI launched three new audio models on May 7 — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — adding GPT-5-class reasoning, a 128K context window, and metered live translation across 70 languages to its already-GA Realtime API.

1 week ago3 min read