What is Jamba 2 and how is it different from a standard transformer?

Jamba 2 is a 398-billion-parameter language model from AI21 Labs that uses a hybrid architecture combining Mamba-style state space model (SSM) layers with traditional attention layers. The SSM layers handle long-range dependencies without the quadratic attention costs of pure transformers, reducing inference costs by roughly 80%.

Is Jamba 2 open source?

Partially. AI21 Labs has released open weights for Jamba 2 Mini (52B parameters) under a permissive license. The full 398B Jamba 2 model is available only through AI21's API, and the upcoming Jamba 2 Ultra has not yet been released.

AI21 Labs Releases Jamba 2, a Hybrid SSM-Transformer That Matches GPT-5 at One-Fifth the Cost

AI21 Labs has released Jamba 2, a 398-billion-parameter model that takes a fundamentally different approach to architecture by interleaving Mamba-style state space model (SSM) layers with traditional transformer attention layers. The result matches GPT-5 and Claude Sonnet 4.5 on major reasoning benchmarks while running inference at roughly one-fifth the cost.

How the Hybrid Architecture Works

Pure transformer models compute attention across all tokens in a sequence, creating quadratic scaling costs as context windows grow. Jamba 2 replaces a significant portion of these attention layers with SSM layers based on the Mamba architecture, which process sequences in linear time by maintaining a compressed state representation instead of attending to every previous token.

The attention layers that remain handle tasks where precise token-to-token relationships matter — retrieval, exact matching, and fine-grained reasoning. The SSM layers handle long-range dependency tracking, summarization, and general language modeling. AI21 reports that this division of labor is what makes the cost reduction possible without sacrificing quality.

Benchmark Results

On MMLU-Pro, HumanEval+, and MATH-500, the 398B Jamba 2 scores within striking distance of both GPT-5 and Claude Sonnet 4.5. Where the model pulls ahead is on long-document tasks. With a 256K context window and linear-time SSM layers handling the bulk of long-range processing, Jamba 2 outperforms all competitors on multi-document QA, long-form summarization, and needle-in-a-haystack retrieval at extreme context lengths.

AI21 claims the cost advantage compounds at longer contexts. At 256K tokens, Jamba 2 inference is roughly 8x cheaper than a comparable pure-transformer model because the SSM layers avoid the quadratic attention blowup entirely.

Three Model Sizes

The Jamba 2 family includes three tiers:

Jamba 2 Mini (52B) — Open weights, suitable for self-hosting on multi-GPU setups. Competitive with models in the 70B class on standard benchmarks.
Jamba 2 (398B) — API-only. The flagship model with full benchmark-matching performance.
Jamba 2 Ultra — Not yet released. AI21 says it will target the next generation of frontier capabilities.

The open-weight Mini release gives developers and researchers access to the hybrid architecture for experimentation and fine-tuning, following the trend set by DeepSeek R2 and other recent open-weight releases.

Why It Matters

Jamba 2 is the strongest evidence yet that pure transformer architectures may not be the final answer. The hybrid SSM-transformer approach addresses the two biggest pain points in LLM deployment — inference cost and long-context performance — without requiring the kind of hardware breakthroughs that GPU manufacturers are racing to deliver. If these efficiency gains hold at scale, other labs will face pressure to adopt similar hybrid designs or explain why they are paying five times more for equivalent results.

AI21 Labs Releases Jamba 2, a Hybrid SSM-Transformer That Matches GPT-5 at One-Fifth the Cost

How the Hybrid Architecture Works

Benchmark Results

Three Model Sizes

Why It Matters

More in Models

Moonshot Kimi K2.6 lands open-source, scales to 300 sub-agents and 4,000 coordinated steps

OpenAI's 'Spud' Caught Live in API Testing, Polymarket Jumps to 81% for April 23 Launch

OpenAI Launches GPT-Rosalind, Its First Domain-Specific Model Built for Life Sciences