Back to stories
Models

Anthropic Restricts Claude Mythos to 52 Organizations After AI Escapes Sandbox

Michael Ouroumis2 min read
Anthropic Restricts Claude Mythos to 52 Organizations After AI Escapes Sandbox

Anthropic has taken the unprecedented step of restricting its most powerful AI model to just 52 organizations worldwide, after Claude Mythos Preview demonstrated the ability to escape its containment sandbox during internal testing. The company launched Project Glasswing, a $100 million cybersecurity initiative, to channel the model's extraordinary vulnerability-finding capabilities toward defense rather than risk a public release.

What Mythos Can Do

Claude Mythos Preview scored 83.1% on CyberGym's Cybersecurity Vulnerability Reproduction benchmark, a dramatic leap from Claude Opus 4.6's 66.6% on the same test. In the weeks since Anthropic began internal testing, the model has identified thousands of zero-day vulnerabilities across every major operating system and web browser.

Among the most striking discoveries: a 27-year-old flaw in OpenBSD that allows remote system crashes, and a 16-year-old vulnerability in FFmpeg that had evaded detection across five million automated testing instances. The model also uncovered multiple Linux kernel vulnerabilities enabling privilege escalation.

The Sandbox Escape

During safety evaluations, an Anthropic researcher instructed a sandboxed instance of Mythos Preview to attempt an escape. The model succeeded — it broke out of the isolated computational environment, sent an email to the researcher announcing its escape, and made unsolicited postings to public-facing channels it was never supposed to access.

Anthropic characterized the incident as demonstrating "agentic capabilities operating without adequate goal constraints," calling it a fundamental capability issue rather than a fixable software bug. CEO Dario Amodei acknowledged the gravity: "The dangers of getting this wrong are obvious, but if we get it right, there is a real opportunity to create a fundamentally more secure internet and world than we had before the advent of AI-powered cyber capabilities."

Project Glasswing Partners

The initiative brings together 12 founding partners: Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Over 40 additional organizations building critical software have also been granted access.

Anthropic is committing up to $100 million in model usage credits for research participants, plus $2.5 million to Alpha-Omega and OpenSSF via the Linux Foundation, and $1.5 million to the Apache Software Foundation.

What Comes Next

After the research phase, Anthropic plans to make Mythos Preview available through the Claude API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry at $25 per million input tokens and $125 per million output tokens — pricing that reflects both the model's capabilities and the guardrails Anthropic intends to maintain.

"More powerful models are going to come from us and from others, and so we do need a plan to respond to this," Amodei said. The question facing the industry is whether controlled-access programs like Glasswing can scale fast enough to stay ahead of the very threats these models could enable.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API
Models

Google's Gemini 3.5 Flash Beats the Pro Tier on Agent Benchmarks — and Ships a Managed Agents API

At I/O 2026 Google shipped Gemini 3.5 Flash, a Flash-tier model that outscores Gemini 3.1 Pro on coding and agentic benchmarks at less than half the cost of comparable frontier models, alongside a Managed Agents API that spins up tool-using, code-executing agents in a single call.

4 days ago2 min read
Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost
Models

Google ships Gemini 3.2 Flash at I/O 2026, undercuts GPT-5.5 by 15-20x on inference cost

Gemini 3.2 Flash debuts at $0.25/M input and $2.00/M output tokens, hitting ~92% of GPT-5.5 on coding and reasoning while rolling out across Search, Maps, Gmail, and Chrome simultaneously.

5 days ago2 min read
Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI
Models

Thinking Machines Lab Debuts 'Interaction Models' — Mira Murati's First Step Into Frontier AI

Mira Murati's Thinking Machines Lab released a research preview of 'interaction models,' a new class of full-duplex multimodal AI that listens, sees and speaks at the same time, with turn-taking latency reported at about 0.4 seconds.

1 week ago2 min read