Back to stories
Models

GLM-5.1 Cracks Code Arena Top 3, First Open-Weight Model to Do So

Michael Ouroumis2 min read
GLM-5.1 Cracks Code Arena Top 3, First Open-Weight Model to Do So

Z.ai's GLM-5.1 has become the first open-weight model to reach the top three on Code Arena, the human-judged coding leaderboard that has long been dominated by closed frontier systems from Anthropic, OpenAI and Google. The Chinese lab — formerly known as Zhipu AI — confirmed the milestone this week after its new flagship posted a 1530 Elo score on the board.

A new benchmark for open models

Released on April 7, 2026 under the permissive MIT License on Hugging Face, GLM-5.1 is a 754-billion-parameter Mixture-of-Experts model targeting agentic engineering and long-horizon coding work. The Code Arena result builds on the GLM-5.1 launch from Z.AI in late March, which early reviewers were already calling the best open agentic model available. Its Code Arena placement on April 10 puts it behind only Anthropic's claude-opus-4-6-thinking (1548) and claude-opus-4-6 (1542), while sitting ahead of every GPT-5 and Gemini 3 model on the agentic webdev leaderboard.

The jump is unusually large for a point release. Reports indicate GLM-5.1 gained roughly 90 Elo points over GLM-5 and around 100 over Kimi K2.5 Thinking, the prior best open entrant. On SWE-Bench Pro, third-party coverage places GLM-5.1 ahead of both Claude Opus 4.6 and GPT-5.4 on a number of tasks.

Why Code Arena matters

Unlike synthetic benchmarks, Code Arena ranks models through head-to-head comparisons judged by developers on real coding prompts. A top-three finish there is harder to game than a leaderboard saturated with closed-book multiple-choice questions, and it is the metric many engineering teams watch when picking a default coding model.

Open weights, permissive license

The release is distributed on Hugging Face at zai-org/GLM-5.1 under the MIT License — one of the most permissive in open source. Enterprises and researchers can fine-tune, modify and redistribute the weights commercially without royalties, which removes one of the last structural advantages frontier labs have enjoyed against open alternatives.

That combination — frontier-tier ranking plus an unrestricted license — is what makes this release different from previous open-weight milestones. Earlier Chinese-origin models such as DeepSeek-V3 and Qwen built strong benchmark stories, but none had cleared Code Arena's top three.

Implications for the frontier

For closed-lab incumbents, GLM-5.1 puts pressure on pricing power. When a freely downloadable model beats GPT-5.4 and Gemini 3.1 Pro on a human-judged coding board, the premium enterprises pay for API access has to be justified on things other than raw capability — latency, safety tooling, context length, or integration depth.

For open-source advocates, the release is a proof point that the capability gap between open and closed frontier models has narrowed materially in 2026. The question now is whether U.S. and European labs respond by accelerating their own open-weight releases, or by leaning further into proprietary agentic tooling where model weights alone no longer decide the winner.

GLM-5.1 is available today on Hugging Face, and Z.ai is hosting the model through its own API for teams that prefer not to self-host.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Microsoft's MAI-Image-2.5 Hits No. 3 on Arena, Level With Google's Nano Banana 2
Models

Microsoft's MAI-Image-2.5 Hits No. 3 on Arena, Level With Google's Nano Banana 2

Microsoft's in-house MAI-Image-2.5 debuted at No. 3 on LMArena's text-to-image leaderboard, matching Google's Nano Banana 2 and trailing only OpenAI's Image-2 — and it's headed to Microsoft Foundry within two weeks.

1 hours ago2 min read
Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API
Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

At I/O 2026 Google shipped Gemini 3.5 Flash, a Flash-tier model that outscores Gemini 3.1 Pro on coding and agentic benchmarks at less than half the cost of comparable frontier models, alongside a Managed Agents API that spins up tool-using, code-executing agents in a single call.

1 week ago2 min read
Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost
Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Gemini 3.2 Flash debuts at $0.25/M input and $2.00/M output tokens, hitting ~92% of GPT-5.5 on coding and reasoning while rolling out across Search, Maps, Gmail, and Chrome simultaneously.

1 week ago2 min read