What does Gemini 3.5 Flash cost per token?

Official API pricing is $1.50 per million input tokens and $9.00 per million output tokens. Google's 'less than half the cost' claim is relative to comparable frontier-tier models, not Google's prior Flash generation — on a raw per-token basis 3.5 Flash is more expensive than the old Flash, so the savings come from capability-per-dollar on agentic workloads.

What is the Managed Agents API and where can I access it?

A single API call spins up an agent that reasons, calls tools, and executes code inside an isolated, Google-hosted Linux environment — removing the need to self-host agent state and sandboxes. It is available via the Interactions API and in AI Studio at ai.dev/managed-agents.

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

Q: Which benchmarks does Gemini 3.5 Flash lead, and against what?

Google reports 3.5 Flash beating Gemini 3.1 Pro on Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo) and MCP Atlas (83.6%), plus 84.2% on CharXiv Reasoning for multimodal understanding. It also claims roughly 4x faster output tokens per second than comparable frontier models.

Google used I/O 2026 to ship Gemini 3.5 Flash, a Flash-tier model that now outscores the previous Pro tier on the benchmarks practitioners actually run agents against — at less than half the cost of comparable frontier models and roughly 4x the output throughput. Google is making it the new default model across the Gemini app, Search, and the API, with rollout starting the day of the keynote.

A Flash model that outruns the Pro tier

Google reports 3.5 Flash beating Gemini 3.1 Pro on the coding and agentic suites that matter for production: Terminal-Bench 2.1 at 76.2%, GDPval-AA at 1656 Elo, and MCP Atlas at 83.6%, plus 84.2% on CharXiv Reasoning for multimodal understanding. The company calls it its "strongest agentic and coding model yet" and pegs throughput at roughly 4x faster output tokens per second than comparable frontier models. Gemini 3.5 Pro is still in testing and is slated for next month.

The headline for builders is the inversion: a Flash-tier model leading the prior Pro tier on agent benchmarks collapses the usual quality-versus-cost tradeoff for high-volume, tool-calling workloads.

The pricing math has a caveat

List pricing is $1.50 per million input tokens and $9.00 per million output tokens. Read the cost claim carefully: the "less than half the cost" pitch is benchmarked against competing frontier-tier models, not Google's own previous Flash — on a strict per-token basis, 3.5 Flash is reportedly pricier than the older Flash. The savings are in capability-per-dollar on agentic tasks, not raw token price.

Google also restructured AI Ultra: a new $100/month entry point, with the former $250 plan now $200/month. Consumer usage moves to a "compute-used" model tied to prompt complexity, with limits refreshing every five hours up to a weekly maximum.

Managed Agents API removes the sandbox plumbing

The more consequential developer release is the Managed Agents API. A single call spins up an agent that reasons, calls tools, and executes code inside an isolated, Google-hosted Linux environment — abstracting away the agent-state and sandbox infrastructure teams currently self-host. It is exposed via the Interactions API and in AI Studio at ai.dev/managed-agents.

Around it, Google added CodeMender, an AI code-security agent for vulnerability detection and remediation, an AI Content Detection API, and Antigravity 2.0, a standalone desktop app and CLI for agent orchestration. Gemini Omni Flash — the any-input, video-output multimodal model — is rolling out to developers and enterprise customers via the Gemini API and Agent Platform API in the coming weeks.

What changes for builders

For anyone running agents at scale, the combination is the story: Pro-tier agentic quality at Flash-tier economics, plus managed execution environments that eliminate a chunk of self-hosted infrastructure. The competitive pressure on per-task inference cost — and on rival agent platforms — just tightened.

— Michael Ouroumis

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

A Flash model that outruns the Pro tier

The pricing math has a caveat

Managed Agents API removes the sandbox plumbing

What changes for builders

More in Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning