Back to stories
Industry

Google Splits Eighth-Gen TPUs Into Training and Inference Chips for Agentic Era

Michael Ouroumis3 min read
Google Splits Eighth-Gen TPUs Into Training and Inference Chips for Agentic Era

Google used the opening keynote at Cloud Next '26 on April 22 to declare the arrival of what it called the 'agentic era' of AI — and to unveil the silicon it believes will power it. The company announced its eighth-generation Tensor Processing Units, splitting the line into two purpose-built chips: TPU 8t for training frontier models and TPU 8i for serving them at scale. The message to Nvidia was unsubtle: Google now intends to compete head-on at both ends of the AI workload.

Two chips instead of one

For seven generations, Google's TPU roadmap produced a single accelerator that tried to balance training and inference. With TPU 8, the company abandoned that compromise. The training-focused TPU 8t scales to 9,600 chips in a single superpod with two petabytes of shared high-bandwidth memory and 121 ExaFlops of compute, delivering close to three times the per-pod performance of the previous Ironwood generation. Google also claims 97% 'goodput' — the share of time the chips spend doing useful work rather than stalling on failures or communication overhead.

The TPU 8i, tuned for inference, takes a different shape. Each chip carries 288 GB of high-bandwidth memory, 384 MB of on-chip SRAM (roughly triple the prior generation), and 19.2 Tb/s of interconnect bandwidth. Google positions it as offering about 80% better performance-per-dollar for inference workloads than Ironwood, and says the two chips together deliver up to 2x better performance-per-watt versus the previous generation.

Virgo Network and the fabric behind the chips

Alongside the accelerators, Google introduced a new data-center networking architecture it calls Virgo Network. According to reporting on the keynote, Virgo provides roughly a 4x increase in bandwidth per accelerator versus the previous generation and can link up to 134,000 TPU 8t chips through a non-blocking bi-sectional fabric of up to 47 petabits per second. That scale matters because agentic workloads — long-running reasoning chains, multi-tool pipelines, continuous background tasks — stress interconnect and memory bandwidth far more than classic chatbot traffic.

Implications: Nvidia, Anthropic, and the cost curve

The split architecture is Google's sharpest attempt yet to undercut Nvidia on the economics of inference, which is increasingly where AI dollars are actually spent. Google's cloud business has been riding a wave of TPU demand from Anthropic, which the company confirmed earlier this year would have access to up to one million TPU chips and more than a gigawatt of capacity in 2026. A cheaper, denser inference chip makes that commitment more defensible — and gives Google a clearer story to tell enterprise customers weighing Nvidia GPUs against custom silicon.

For developers, the takeaway is narrower but still meaningful: the chips that will run the next wave of autonomous agents, long-context assistants, and multi-step workflows are starting to look different from the ones that trained them. Google is betting that bifurcation — and the 'later in 2026' rollout window it set today — will be enough to keep pace as rivals push their own custom accelerators. Nvidia will not concede the inference market without a fight, but after today, it has a visibly sharper challenger.

Learn AI for Free — FreeAcademy.ai

Take "AI for Business: Practical Implementation" — a free course with certificate to master the skills behind this story.

More in Industry

IBM Beats Q1 on AI-Fueled Software Growth, But Shares Drop 6%
Industry

IBM Beats Q1 on AI-Fueled Software Growth, But Shares Drop 6%

IBM reported Q1 2026 revenue of $15.92 billion, a 9% jump driven by generative AI demand, yet shares slid more than 6% after hours as investors looked past the headline beat.

5 min ago2 min read
SpaceX Locks In $60B Option to Acquire Cursor, Pairs Coding Startup With Colossus
Industry

SpaceX Locks In $60B Option to Acquire Cursor, Pairs Coding Startup With Colossus

SpaceX has secured the right to buy AI coding startup Cursor for $60 billion later this year, or pay $10 billion for work performed, in a deal that routes Cursor's model training onto xAI's Colossus supercomputer.

4 hours ago2 min read
Hackers Breach Anthropic's 'Too Dangerous' Mythos Model via Third-Party Vendor
Industry

Hackers Breach Anthropic's 'Too Dangerous' Mythos Model via Third-Party Vendor

A private Discord group gained unauthorized access to Anthropic's Claude Mythos Preview — the cybersecurity model the company deemed too dangerous for public release — through a third-party vendor environment, according to a Bloomberg report.

6 hours ago2 min read