Back to stories
Industry

MiniMax M2.5 Matches Claude Opus 4.6 on Coding Benchmarks — at 1/20th the Cost

Michael Ouroumis2 min read
MiniMax M2.5 Matches Claude Opus 4.6 on Coding Benchmarks — at 1/20th the Cost

MiniMax M2.5 Matches Claude Opus 4.6 on Coding Benchmarks — at 1/20th the Cost

The economics of frontier AI just shifted again. MiniMax, a Chinese AI startup that has more than doubled its sales in the past year, released M2.5 as open weights — and the benchmarks are making the industry take notice.

The Numbers

M2.5 scores 80.2% on SWE-Bench Verified, the real-world software engineering benchmark that has become the gold standard for evaluating coding models. That matches Anthropic's Claude Opus 4.6, currently considered the best coding model available.

On Multi-SWE-Bench, which tests models across multiple repositories simultaneously, M2.5 ranks first at 51.3%.

The catch — or rather, the lack of one — is the cost. M2.5 runs at approximately one dollar per hour at 100 tokens per second, which is nearly twice the generation speed of other frontier models. That translates to roughly 1/20th the cost of running Claude Opus 4.6 through Anthropic's API.

How It Works

M2.5 is a 230-billion parameter mixture-of-experts model that activates only 10 billion parameters per forward pass. This architecture is what makes the cost equation possible — the model has frontier-level knowledge encoded in its 230 billion parameters but only needs the compute budget of a 10-billion parameter model for each token.

The model was trained using MiniMax's proprietary Forge reinforcement learning framework and released on Hugging Face under a modified MIT license.

What It Does Well

Beyond raw benchmarks, M2.5 has drawn attention for practical enterprise capabilities. It handles agentic tool use — autonomously calling APIs, writing files, and executing multi-step workflows — at a level that previously required models costing twenty times more.

The model also generates Microsoft Office documents directly, a capability that positions it for enterprise productivity workflows where creating Word documents, Excel spreadsheets, and PowerPoint presentations from natural language is increasingly expected.

The Bigger Implication

MiniMax's thesis is straightforward: the future of AI is not about building the smartest model. It is about building the smartest model that organizations can actually afford to deploy at scale.

When frontier performance is available at commodity pricing, the competitive advantage shifts from model capability to integration, reliability, and domain-specific fine-tuning. Companies that built their AI strategy around a single premium API provider may need to rethink their approach.

The open-weights release means any team can download M2.5 today and start evaluating it against their existing Claude or GPT deployments. For many production workloads, the performance-per-dollar improvement will be hard to ignore.

More in Industry

AMD Unveils MI400 AI Accelerator — First Real Threat to NVIDIA's Dominance
Industry

AMD Unveils MI400 AI Accelerator — First Real Threat to NVIDIA's Dominance

AMD launches the Instinct MI400, an AI accelerator with 256GB of HBM4 memory and training performance that AMD claims matches NVIDIA's H200 at 40% lower cost per chip.

1 day ago2 min read
Apple Announces On-Device LLM at WWDC 2026 — Privacy-First AI
Industry

Apple Announces On-Device LLM at WWDC 2026 — Privacy-First AI

Apple unveils a 3-billion parameter large language model that runs entirely on-device across iPhone, iPad, and Mac, powering a dramatically upgraded Siri with no cloud dependency for core features.

1 day ago2 min read
Cursor AI Raises $500M at $2B Valuation as AI-Native IDEs Go Mainstream
Industry

Cursor AI Raises $500M at $2B Valuation as AI-Native IDEs Go Mainstream

Anysphere, the company behind the Cursor AI code editor, closes a $500 million Series C at a $2 billion valuation, signaling that AI-native development environments are becoming the industry default.

1 day ago2 min read