Z.ai's GLM-5.1 has become the first open-weight model to reach the top three on Code Arena, the human-judged coding leaderboard that has long been dominated by closed frontier systems from Anthropic, OpenAI and Google. The Chinese lab — formerly known as Zhipu AI — confirmed the milestone this week after its new flagship posted a 1530 Elo score on the board.
A new benchmark for open models
Released on April 7, 2026 under the permissive MIT License on Hugging Face, GLM-5.1 is a 754-billion-parameter Mixture-of-Experts model targeting agentic engineering and long-horizon coding work. Its Code Arena placement on April 10 puts it behind only Anthropic's claude-opus-4-6-thinking (1548) and claude-opus-4-6 (1542), while sitting ahead of every GPT-5 and Gemini 3 model on the agentic webdev leaderboard.
The jump is unusually large for a point release. Reports indicate GLM-5.1 gained roughly 90 Elo points over GLM-5 and around 100 over Kimi K2.5 Thinking, the prior best open entrant. On SWE-Bench Pro, third-party coverage places GLM-5.1 ahead of both Claude Opus 4.6 and GPT-5.4 on a number of tasks.
Why Code Arena matters
Unlike synthetic benchmarks, Code Arena ranks models through head-to-head comparisons judged by developers on real coding prompts. A top-three finish there is harder to game than a leaderboard saturated with closed-book multiple-choice questions, and it is the metric many engineering teams watch when picking a default coding model.
Open weights, permissive license
The release is distributed on Hugging Face at zai-org/GLM-5.1 under the MIT License — one of the most permissive in open source. Enterprises and researchers can fine-tune, modify and redistribute the weights commercially without royalties, which removes one of the last structural advantages frontier labs have enjoyed against open alternatives.
That combination — frontier-tier ranking plus an unrestricted license — is what makes this release different from previous open-weight milestones. Earlier Chinese-origin models such as DeepSeek-V3 and Qwen built strong benchmark stories, but none had cleared Code Arena's top three.
Implications for the frontier
For closed-lab incumbents, GLM-5.1 puts pressure on pricing power. When a freely downloadable model beats GPT-5.4 and Gemini 3.1 Pro on a human-judged coding board, the premium enterprises pay for API access has to be justified on things other than raw capability — latency, safety tooling, context length, or integration depth.
For open-source advocates, the release is a proof point that the capability gap between open and closed frontier models has narrowed materially in 2026. The question now is whether U.S. and European labs respond by accelerating their own open-weight releases, or by leaning further into proprietary agentic tooling where model weights alone no longer decide the winner.
GLM-5.1 is available today on Hugging Face, and Z.ai is hosting the model through its own API for teams that prefer not to self-host.



