MiniMax M2.5 Matches Claude Opus 4.6 on Coding Benchmarks — at 1/20th the Cost
The economics of frontier AI just shifted again. MiniMax, a Chinese AI startup that has more than doubled its sales in the past year, released M2.5 as open weights — and the benchmarks are making the industry take notice.
The Numbers
M2.5 scores 80.2% on SWE-Bench Verified, the real-world software engineering benchmark that has become the gold standard for evaluating coding models. That matches Anthropic's Claude Opus 4.6, currently considered the best coding model available.
On Multi-SWE-Bench, which tests models across multiple repositories simultaneously, M2.5 ranks first at 51.3%.
The catch — or rather, the lack of one — is the cost. M2.5 runs at approximately one dollar per hour at 100 tokens per second, which is nearly twice the generation speed of other frontier models. That translates to roughly 1/20th the cost of running Claude Opus 4.6 through Anthropic's API.
How It Works
M2.5 is a 230-billion parameter mixture-of-experts model that activates only 10 billion parameters per forward pass. This architecture is what makes the cost equation possible — the model has frontier-level knowledge encoded in its 230 billion parameters but only needs the compute budget of a 10-billion parameter model for each token.
The model was trained using MiniMax's proprietary Forge reinforcement learning framework and released on Hugging Face under a modified MIT license.
What It Does Well
Beyond raw benchmarks, M2.5 has drawn attention for practical enterprise capabilities. It handles agentic tool use — autonomously calling APIs, writing files, and executing multi-step workflows — at a level that previously required models costing twenty times more.
The model also generates Microsoft Office documents directly, a capability that positions it for enterprise productivity workflows where creating Word documents, Excel spreadsheets, and PowerPoint presentations from natural language is increasingly expected.
The Bigger Implication
MiniMax's thesis is straightforward: the future of AI is not about building the smartest model. It is about building the smartest model that organizations can actually afford to deploy at scale.
When frontier performance is available at commodity pricing, the competitive advantage shifts from model capability to integration, reliability, and domain-specific fine-tuning. Companies that built their AI strategy around a single premium API provider may need to rethink their approach.
The open-weights release means any team can download M2.5 today and start evaluating it against their existing Claude or GPT deployments. For many production workloads, the performance-per-dollar improvement will be hard to ignore.



