Nvidia has dominated AI training for years. Now it wants to own inference too — and it's using $20 billion worth of acquired technology to do it.
The Secret Chip
According to a report from SiliconANGLE, Nvidia is preparing to unveil a new inference-focused processor at its annual GTC developer conference in San Jose later this month. The chip integrates Language Processing Unit (LPU) architecture that Nvidia licensed from Groq Inc. in December for $20 billion, along with hiring Groq's founding CEO Jonathan Ross and President Sunny Madra.
Groq's LPU architecture takes a fundamentally different approach to inference. Instead of repurposing GPUs designed for training, LPUs are built from the ground up to decode language model outputs with dramatically lower latency and energy consumption.
OpenAI Signs On First
The biggest signal of the chip's potential: OpenAI has already committed as the lead customer. The deal includes a massive purchase of dedicated inference capacity, backed by a $30 billion investment from Nvidia into OpenAI's infrastructure. That's not a research partnership — it's a production-scale commitment.
For OpenAI, which runs ChatGPT for over 900 million users, inference costs dwarf training costs. A chip purpose-built for fast, efficient model serving could meaningfully change the economics of running frontier models at consumer scale.
Why Inference Matters Now
The AI industry has reached an inflection point. Training the biggest models still requires enormous GPU clusters, but the real cost center has shifted. Every ChatGPT response, every Copilot suggestion, every Claude conversation is an inference workload. Companies are spending more on running models than building them.
Nvidia currently controls over 90% of the GPU market for AI training, but inference is more competitive. AMD, Intel, AWS custom silicon, and startups like Cerebras are all targeting the inference market. The Groq acquisition gives Nvidia a purpose-built architecture rather than just optimizing existing GPUs.
What to Watch at GTC
GTC 2026 runs later this month and is expected to be Nvidia's biggest product launch since the Blackwell architecture. Beyond the inference chip, CEO Jensen Huang is expected to detail the full Rubin platform roadmap and new software tools for agentic AI workloads.
The inference chip could reshape how AI companies budget their compute. If it delivers on the efficiency promises of Groq's LPU architecture, running frontier models just got a lot cheaper.



