Back to stories
Models

AI21 Labs Releases Jamba 2, a Hybrid SSM-Transformer That Matches GPT-5 at One-Fifth the Cost

Michael Ouroumis2 min read
AI21 Labs Releases Jamba 2, a Hybrid SSM-Transformer That Matches GPT-5 at One-Fifth the Cost

AI21 Labs has released Jamba 2, a 398-billion-parameter model that takes a fundamentally different approach to architecture by interleaving Mamba-style state space model (SSM) layers with traditional transformer attention layers. The result matches GPT-5 and Claude Sonnet 4.5 on major reasoning benchmarks while running inference at roughly one-fifth the cost.

How the Hybrid Architecture Works

Pure transformer models compute attention across all tokens in a sequence, creating quadratic scaling costs as context windows grow. Jamba 2 replaces a significant portion of these attention layers with SSM layers based on the Mamba architecture, which process sequences in linear time by maintaining a compressed state representation instead of attending to every previous token.

The attention layers that remain handle tasks where precise token-to-token relationships matter — retrieval, exact matching, and fine-grained reasoning. The SSM layers handle long-range dependency tracking, summarization, and general language modeling. AI21 reports that this division of labor is what makes the cost reduction possible without sacrificing quality.

Benchmark Results

On MMLU-Pro, HumanEval+, and MATH-500, the 398B Jamba 2 scores within striking distance of both GPT-5 and Claude Sonnet 4.5. Where the model pulls ahead is on long-document tasks. With a 256K context window and linear-time SSM layers handling the bulk of long-range processing, Jamba 2 outperforms all competitors on multi-document QA, long-form summarization, and needle-in-a-haystack retrieval at extreme context lengths.

AI21 claims the cost advantage compounds at longer contexts. At 256K tokens, Jamba 2 inference is roughly 8x cheaper than a comparable pure-transformer model because the SSM layers avoid the quadratic attention blowup entirely.

Three Model Sizes

The Jamba 2 family includes three tiers:

The open-weight Mini release gives developers and researchers access to the hybrid architecture for experimentation and fine-tuning, following the trend set by DeepSeek R2 and other recent open-weight releases.

Why It Matters

Jamba 2 is the strongest evidence yet that pure transformer architectures may not be the final answer. The hybrid SSM-transformer approach addresses the two biggest pain points in LLM deployment — inference cost and long-context performance — without requiring the kind of hardware breakthroughs that GPU manufacturers are racing to deliver. If these efficiency gains hold at scale, other labs will face pressure to adopt similar hybrid designs or explain why they are paying five times more for equivalent results.

More in Models

Microsoft Releases Phi-4-Reasoning-Vision-15B: A Small Model That Knows When to Think
Models

Microsoft Releases Phi-4-Reasoning-Vision-15B: A Small Model That Knows When to Think

Microsoft open-sources Phi-4-reasoning-vision-15B, a compact 15B-parameter multimodal model that selectively activates chain-of-thought reasoning and rivals models many times its size.

8 hours ago2 min read
Anthropic Releases Claude Opus 4.6 — Its Most Capable Agentic Coding Model
Models

Anthropic Releases Claude Opus 4.6 — Its Most Capable Agentic Coding Model

Anthropic launches Claude Opus 4.6, a frontier model purpose-built for autonomous coding agents that can plan, execute, and debug multi-file projects with minimal human oversight.

1 day ago2 min read
Meta Releases Llama 4 Maverick With 400B Parameters Under Open Weights
Models

Meta Releases Llama 4 Maverick With 400B Parameters Under Open Weights

Meta releases Llama 4 Maverick, a 400-billion parameter mixture-of-experts model under its open weights license, matching GPT-5 on key benchmarks and reigniting the open-source AI debate.

1 day ago2 min read