AI21 Labs has released Jamba 2, a 398-billion-parameter model that takes a fundamentally different approach to architecture by interleaving Mamba-style state space model (SSM) layers with traditional transformer attention layers. The result matches GPT-5 and Claude Sonnet 4.5 on major reasoning benchmarks while running inference at roughly one-fifth the cost.
How the Hybrid Architecture Works
Pure transformer models compute attention across all tokens in a sequence, creating quadratic scaling costs as context windows grow. Jamba 2 replaces a significant portion of these attention layers with SSM layers based on the Mamba architecture, which process sequences in linear time by maintaining a compressed state representation instead of attending to every previous token.
The attention layers that remain handle tasks where precise token-to-token relationships matter — retrieval, exact matching, and fine-grained reasoning. The SSM layers handle long-range dependency tracking, summarization, and general language modeling. AI21 reports that this division of labor is what makes the cost reduction possible without sacrificing quality.
Benchmark Results
On MMLU-Pro, HumanEval+, and MATH-500, the 398B Jamba 2 scores within striking distance of both GPT-5 and Claude Sonnet 4.5. Where the model pulls ahead is on long-document tasks. With a 256K context window and linear-time SSM layers handling the bulk of long-range processing, Jamba 2 outperforms all competitors on multi-document QA, long-form summarization, and needle-in-a-haystack retrieval at extreme context lengths.
AI21 claims the cost advantage compounds at longer contexts. At 256K tokens, Jamba 2 inference is roughly 8x cheaper than a comparable pure-transformer model because the SSM layers avoid the quadratic attention blowup entirely.
Three Model Sizes
The Jamba 2 family includes three tiers:
- Jamba 2 Mini (52B) — Open weights, suitable for self-hosting on multi-GPU setups. Competitive with models in the 70B class on standard benchmarks.
- Jamba 2 (398B) — API-only. The flagship model with full benchmark-matching performance.
- Jamba 2 Ultra — Not yet released. AI21 says it will target the next generation of frontier capabilities.
The open-weight Mini release gives developers and researchers access to the hybrid architecture for experimentation and fine-tuning, following the trend set by DeepSeek R2 and other recent open-weight releases.
Why It Matters
Jamba 2 is the strongest evidence yet that pure transformer architectures may not be the final answer. The hybrid SSM-transformer approach addresses the two biggest pain points in LLM deployment — inference cost and long-context performance — without requiring the kind of hardware breakthroughs that GPU manufacturers are racing to deliver. If these efficiency gains hold at scale, other labs will face pressure to adopt similar hybrid designs or explain why they are paying five times more for equivalent results.



