Back to stories
Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Michael Ouroumis2 min read
Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Google opened Google I/O 2026 at Shoreline Amphitheatre on Tuesday with Gemini 3.2 Flash as the headline model, priced at $0.25 per million input tokens and $2.00 per million output tokens. Developers had already extracted those pricing entries from Google AI Studio metadata before the keynote when the model surfaced quietly in the iOS Gemini app and AI Studio on May 5 with no press release. Google's positioning today: cost-per-inference at roughly one-fifteenth to one-twentieth of GPT-5.5, query latencies most often under 200 ms.

Deployment scope is the announcement

The price tag matters because of where 3.2 Flash is going: simultaneously into Search, Maps, YouTube, Docs, Gmail, and Chrome at billion-user scale. Sundar Pichai's pitch isn't a leaderboard win — it's that the model is cheap enough to serve under every Google surface without an obvious unit-economics blowup. Search-ad revenue is, in effect, subsidizing a frontier-tier Flash model down to a price point Anthropic and OpenAI cannot match without margin compression.

How it stacks against GPT-5.5

Google's comparison materials peg 3.2 Flash at roughly 92% of GPT-5.5 on coding and reasoning. That's the capability gap Google is willing to live with given the per-token delta. Pre-keynote, blind Arena evaluations from developers running the leaked model flagged a Flash-tier system beating Gemini 3.1 Pro on creative coding — including complex single-pass SVG generation that 3.1 Pro failed without retries.

For builders, the practical question is whether a 92%-of-frontier model at 1/15th the cost wins on total cost of ownership. For most production agent workflows — RAG pipelines, agent loops, batch transformations where output volume and retry budgets dominate the bill — the answer is yes.

What else shipped today

The I/O lineup also confirms:

Implications for builders

The biggest signal is pricing pressure. Anthropic's Opus 4.7 tokenizer hike and OpenAI's recent API moves had reset the cost floor upward for frontier-tier work. Gemini 3.2 Flash resets it back down, and Google has the search-ad subsidy to keep it there. Expect downstream price moves from competitors over the next two earnings cycles — and re-architecting opportunities for any agent stack currently paying GPT-5.5 rates on tasks that don't actually need the top tier.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API
Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

At I/O 2026 Google shipped Gemini 3.5 Flash, a Flash-tier model that outscores Gemini 3.1 Pro on coding and agentic benchmarks at less than half the cost of comparable frontier models, alongside a Managed Agents API that spins up tool-using, code-executing agents in a single call.

4 hours ago2 min read
Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI
Models

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

Mira Murati's Thinking Machines Lab released a research preview of 'interaction models,' a new class of full-duplex multimodal AI that listens, sees and speaks at the same time, with turn-taking latency reported at about 0.4 seconds.

1 week ago2 min read
OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning
Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI launched three new audio models on May 7 — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — adding GPT-5-class reasoning, a 128K context window, and metered live translation across 70 languages to its already-GA Realtime API.

1 week ago3 min read