Back to stories
Models

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

Michael Ouroumis2 min read
Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

Zyphra has released ZAYA1-8B, a small open-weight reasoning mixture-of-experts model that matches or beats frontier proprietary systems on hard math and coding tasks — and the entire training run, from pretraining through supervised fine-tuning, was done on AMD Instinct MI300X GPUs. The model dropped on May 6 and continues to dominate AI infrastructure discussions into the back half of this week.

A small MoE punching at the frontier

ZAYA1-8B has just over 8 billion total parameters but activates fewer than 1 billion per token (roughly 760 million active). On standard reasoning benchmarks it goes toe-to-toe with Anthropic's Claude 4.5 Sonnet, Google's Gemini 2.5 Pro, DeepSeek-R1-0528, and Mistral-Small-4-119B. With Zyphra's Markovian RSA test-time compute methodology, the model edges past Claude 4.5 Sonnet on HMMT'25 (89.6 vs 88.3) and surpasses DeepSeek-V3.2 and GPT-OSS-120B on the APEX-shortlist mathematics benchmark under extended compute.

The weights are released under Apache 2.0 on Hugging Face, with a free serverless endpoint available on Zyphra Cloud.

Compressed Convolutional Attention

The headline architectural change is Compressed Convolutional Attention (CCA), an attention variant that operates in a compressed latent space and achieves roughly 8x KV-cache compression versus standard attention. That dramatically reduces memory pressure at inference time and makes longer effective context windows tractable on smaller hardware. The model also introduces a novel MLP-based expert router that improves routing stability over standard linear routers, plus learned residual scaling.

Zyphra frames the work as maximizing "the intelligence extracted per parameter and per FLOP" — a deliberate counterpoint to the brute-force scaling pursued by frontier labs.

A real win for AMD's training story

ZAYA1-8B was pretrained on 1,024 AMD MI300X GPUs interconnected with AMD Pensando Pollara networking, hosted on IBM Cloud infrastructure. Zyphra describes it as the first MoE model pretrained, midtrained, and supervised fine-tuned end-to-end on an AMD Instinct MI300 stack.

That narrative matters. AMD has spent the last two years trying to convince frontier labs that ROCm and the MI300/MI355 line are viable for training, not only inference. Until now, AMD's marquee training wins have mostly been smaller dense models or partial workloads. A reasoning MoE that beats Claude and Gemini on math — trained entirely on AMD silicon — is the cleanest data point yet that the company's hardware can run a real pretraining campaign without falling back to CUDA.

Implications

For open-source builders, ZAYA1-8B is another shot at the frontier from outside the big labs, this time without the Chinese-export-control complications that surround DeepSeek and Qwen. For AMD, it is the kind of independent, reproducible reference customer that has been notably missing from its AI pitch. And for Nvidia, it is one more piece of evidence that the training-stack moat is shrinking — not gone, but no longer absolute.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning
Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI launched three new audio models on May 7 — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — adding GPT-5-class reasoning, a 128K context window, and metered live translation across 70 languages to its already-GA Realtime API.

3 hours ago2 min read
OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations
Models

OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations

OpenAI replaced GPT-5.3 Instant with GPT-5.5 Instant as the default ChatGPT model, citing a 52.5% reduction in hallucinated claims on high-stakes prompts and major gains on math and multimodal benchmarks.

2 days ago2 min read
xAI Launches Grok 4.3 With 1M Context, Steep Price Cuts, and an Imagine Agent Mode
Models

xAI Launches Grok 4.3 With 1M Context, Steep Price Cuts, and an Imagine Agent Mode

xAI's new Grok 4.3 model arrives at $1.25 per million input tokens with a 1M-token context window, default reasoning, and a beta Imagine Agent Mode for long-form creative projects.

6 days ago3 min read