What is ZAYA1-8B and how does it perform against Claude and Gemini?

ZAYA1-8B is a small mixture-of-experts reasoning model from Zyphra with about 8 billion total parameters and roughly 760 million active per token. On math benchmarks like HMMT'25, it matches or exceeds Claude 4.5 Sonnet, Gemini 2.5 Pro, and DeepSeek-V3.2 — despite being orders of magnitude smaller and openly licensed.

Why does it matter that ZAYA1-8B was trained on AMD instead of Nvidia?

Zyphra trained the model end-to-end on a 1,024-GPU AMD Instinct MI300X cluster with AMD Pensando Pollara networking — the first reasoning MoE pretrained, midtrained, and supervised fine-tuned entirely on AMD hardware. It demonstrates that AMD's stack is now viable for frontier-class training, not just inference.

Where can I download ZAYA1-8B?

The model weights are on Hugging Face under an Apache 2.0 license, and Zyphra also hosts a free serverless endpoint on Zyphra Cloud. Apache 2.0 means commercial use is permitted with no restrictive output clauses.

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

Zyphra has released ZAYA1-8B, a small open-weight reasoning mixture-of-experts model that matches or beats frontier proprietary systems on hard math and coding tasks — and the entire training run, from pretraining through supervised fine-tuning, was done on AMD Instinct MI300X GPUs. The model dropped on May 6 and continues to dominate AI infrastructure discussions into the back half of this week.

A small MoE punching at the frontier

ZAYA1-8B has just over 8 billion total parameters but activates fewer than 1 billion per token (roughly 760 million active). On standard reasoning benchmarks it goes toe-to-toe with Anthropic's Claude 4.5 Sonnet, Google's Gemini 2.5 Pro, DeepSeek-R1-0528, and Mistral-Small-4-119B. With Zyphra's Markovian RSA test-time compute methodology, the model edges past Claude 4.5 Sonnet on HMMT'25 (89.6 vs 88.3) and surpasses DeepSeek-V3.2 and GPT-OSS-120B on the APEX-shortlist mathematics benchmark under extended compute.

The weights are released under Apache 2.0 on Hugging Face, with a free serverless endpoint available on Zyphra Cloud.

Compressed Convolutional Attention

The headline architectural change is Compressed Convolutional Attention (CCA), an attention variant that operates in a compressed latent space and achieves roughly 8x KV-cache compression versus standard attention. That dramatically reduces memory pressure at inference time and makes longer effective context windows tractable on smaller hardware. The model also introduces a novel MLP-based expert router that improves routing stability over standard linear routers, plus learned residual scaling.

Zyphra frames the work as maximizing "the intelligence extracted per parameter and per FLOP" — a deliberate counterpoint to the brute-force scaling pursued by frontier labs.

A real win for AMD's training story

ZAYA1-8B was pretrained on 1,024 AMD MI300X GPUs interconnected with AMD Pensando Pollara networking, hosted on IBM Cloud infrastructure. Zyphra describes it as the first MoE model pretrained, midtrained, and supervised fine-tuned end-to-end on an AMD Instinct MI300 stack.

That narrative matters. AMD has spent the last two years trying to convince frontier labs that ROCm and the MI300/MI355 line are viable for training, not only inference. Until now, AMD's marquee training wins have mostly been smaller dense models or partial workloads. A reasoning MoE that beats Claude and Gemini on math — trained entirely on AMD silicon — is the cleanest data point yet that the company's hardware can run a real pretraining campaign without falling back to CUDA.

Implications

For open-source builders, ZAYA1-8B is another shot at the frontier from outside the big labs, this time without the Chinese-export-control complications that surround DeepSeek and Qwen. For AMD, it is the kind of independent, reproducible reference customer that has been notably missing from its AI pitch. And for Nvidia, it is one more piece of evidence that the training-stack moat is shrinking — not gone, but no longer absolute.

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

A small MoE punching at the frontier

Compressed Convolutional Attention

A real win for AMD's training story

Implications

More in Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI Makes GPT-5.5 Instant the Default ChatGPT Model With 52.5% Fewer Hallucinations

xAI Launches Grok 4.3 With 1M Context, Steep Price Cuts, and an Imagine Agent Mode