Back to stories
Models

GLM-5.1 Cracks Code Arena Top 3, First Open-Weight Model to Do So

Michael Ouroumis2 min read
GLM-5.1 Cracks Code Arena Top 3, First Open-Weight Model to Do So

Z.ai's GLM-5.1 has become the first open-weight model to reach the top three on Code Arena, the human-judged coding leaderboard that has long been dominated by closed frontier systems from Anthropic, OpenAI and Google. The Chinese lab — formerly known as Zhipu AI — confirmed the milestone this week after its new flagship posted a 1530 Elo score on the board.

A new benchmark for open models

Released on April 7, 2026 under the permissive MIT License on Hugging Face, GLM-5.1 is a 754-billion-parameter Mixture-of-Experts model targeting agentic engineering and long-horizon coding work. Its Code Arena placement on April 10 puts it behind only Anthropic's claude-opus-4-6-thinking (1548) and claude-opus-4-6 (1542), while sitting ahead of every GPT-5 and Gemini 3 model on the agentic webdev leaderboard.

The jump is unusually large for a point release. Reports indicate GLM-5.1 gained roughly 90 Elo points over GLM-5 and around 100 over Kimi K2.5 Thinking, the prior best open entrant. On SWE-Bench Pro, third-party coverage places GLM-5.1 ahead of both Claude Opus 4.6 and GPT-5.4 on a number of tasks.

Why Code Arena matters

Unlike synthetic benchmarks, Code Arena ranks models through head-to-head comparisons judged by developers on real coding prompts. A top-three finish there is harder to game than a leaderboard saturated with closed-book multiple-choice questions, and it is the metric many engineering teams watch when picking a default coding model.

Open weights, permissive license

The release is distributed on Hugging Face at zai-org/GLM-5.1 under the MIT License — one of the most permissive in open source. Enterprises and researchers can fine-tune, modify and redistribute the weights commercially without royalties, which removes one of the last structural advantages frontier labs have enjoyed against open alternatives.

That combination — frontier-tier ranking plus an unrestricted license — is what makes this release different from previous open-weight milestones. Earlier Chinese-origin models such as DeepSeek-V3 and Qwen built strong benchmark stories, but none had cleared Code Arena's top three.

Implications for the frontier

For closed-lab incumbents, GLM-5.1 puts pressure on pricing power. When a freely downloadable model beats GPT-5.4 and Gemini 3.1 Pro on a human-judged coding board, the premium enterprises pay for API access has to be justified on things other than raw capability — latency, safety tooling, context length, or integration depth.

For open-source advocates, the release is a proof point that the capability gap between open and closed frontier models has narrowed materially in 2026. The question now is whether U.S. and European labs respond by accelerating their own open-weight releases, or by leaning further into proprietary agentic tooling where model weights alone no longer decide the winner.

GLM-5.1 is available today on Hugging Face, and Z.ai is hosting the model through its own API for teams that prefer not to self-host.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI
Models

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

Mira Murati's Thinking Machines Lab released a research preview of 'interaction models,' a new class of full-duplex multimodal AI that listens, sees and speaks at the same time, with turn-taking latency reported at about 0.4 seconds.

3 days ago2 min read
OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning
Models

OpenAI Ships GPT-Realtime-2 With Live Translation and Streaming Whisper, Pushing Voice Agents Toward GPT-5 Reasoning

OpenAI launched three new audio models on May 7 — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — adding GPT-5-class reasoning, a 128K context window, and metered live translation across 70 languages to its already-GA Realtime API.

6 days ago2 min read
Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD
Models

Zyphra Releases ZAYA1-8B, the First Frontier-Class Reasoning MoE Trained Entirely on AMD

Zyphra's ZAYA1-8B is an Apache-licensed reasoning MoE with under 1B active parameters that matches Claude 4.5 Sonnet and Gemini 2.5 Pro on math benchmarks — and it was pretrained, midtrained, and fine-tuned on AMD Instinct MI300X GPUs.

1 week ago2 min read