Back to stories
Industry

Nvidia Is Building a Secret Inference Chip With Groq Tech — And OpenAI Is the First Customer

Michael Ouroumis2 min read
Nvidia Is Building a Secret Inference Chip With Groq Tech — And OpenAI Is the First Customer

Nvidia has dominated AI training for years. Now it wants to own inference too — and it's using $20 billion worth of acquired technology to do it.

The Secret Chip

According to a report from SiliconANGLE, Nvidia is preparing to unveil a new inference-focused processor at its annual GTC developer conference in San Jose later this month. The chip integrates Language Processing Unit (LPU) architecture that Nvidia licensed from Groq Inc. in December for $20 billion, along with hiring Groq's founding CEO Jonathan Ross and President Sunny Madra.

Groq's LPU architecture takes a fundamentally different approach to inference. Instead of repurposing GPUs designed for training, LPUs are built from the ground up to decode language model outputs with dramatically lower latency and energy consumption.

OpenAI Signs On First

The biggest signal of the chip's potential: OpenAI has already committed as the lead customer. The deal includes a massive purchase of dedicated inference capacity, backed by a $30 billion investment from Nvidia into OpenAI's infrastructure. That's not a research partnership — it's a production-scale commitment.

For OpenAI, which runs ChatGPT for over 900 million users, inference costs dwarf training costs. A chip purpose-built for fast, efficient model serving could meaningfully change the economics of running frontier models at consumer scale.

Why Inference Matters Now

The AI industry has reached an inflection point. Training the biggest models still requires enormous GPU clusters, but the real cost center has shifted. Every ChatGPT response, every Copilot suggestion, every Claude conversation is an inference workload. Companies are spending more on running models than building them.

Nvidia currently controls over 90% of the GPU market for AI training, but inference is more competitive. AMD, Intel, AWS custom silicon, and startups like Cerebras are all targeting the inference market. The Groq acquisition gives Nvidia a purpose-built architecture rather than just optimizing existing GPUs.

What to Watch at GTC

GTC 2026 runs later this month and is expected to be Nvidia's biggest product launch since the Blackwell architecture. Beyond the inference chip, CEO Jensen Huang is expected to detail the full Rubin platform roadmap and new software tools for agentic AI workloads.

The inference chip could reshape how AI companies budget their compute. If it delivers on the efficiency promises of Groq's LPU architecture, running frontier models just got a lot cheaper.

More in Industry

AMD Unveils MI400 AI Accelerator — First Real Threat to NVIDIA's Dominance
Industry

AMD Unveils MI400 AI Accelerator — First Real Threat to NVIDIA's Dominance

AMD launches the Instinct MI400, an AI accelerator with 256GB of HBM4 memory and training performance that AMD claims matches NVIDIA's H200 at 40% lower cost per chip.

1 day ago2 min read
Apple Announces On-Device LLM at WWDC 2026 — Privacy-First AI
Industry

Apple Announces On-Device LLM at WWDC 2026 — Privacy-First AI

Apple unveils a 3-billion parameter large language model that runs entirely on-device across iPhone, iPad, and Mac, powering a dramatically upgraded Siri with no cloud dependency for core features.

1 day ago2 min read
Cursor AI Raises $500M at $2B Valuation as AI-Native IDEs Go Mainstream
Industry

Cursor AI Raises $500M at $2B Valuation as AI-Native IDEs Go Mainstream

Anysphere, the company behind the Cursor AI code editor, closes a $500 million Series C at a $2 billion valuation, signaling that AI-native development environments are becoming the industry default.

1 day ago2 min read