Back to stories
Tools

Microsoft's MDASH AI Hunts Windows Bugs — Tops CyberGym Benchmark and Catches 16 Flaws Humans Missed

Michael Ouroumis2 min read
Microsoft's MDASH AI Hunts Windows Bugs — Tops CyberGym Benchmark and Catches 16 Flaws Humans Missed

Microsoft has unveiled MDASH, a multi-model agentic AI system designed to hunt software vulnerabilities at machine speed — and on its debut it quietly found 16 Windows bugs that shipped fixes in this week's Patch Tuesday, including four critical remote code execution flaws.

Announced via Microsoft's Security Blog on May 12, MDASH — short for multi-model agentic scanning harness — orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models. The system runs a five-stage pipeline (Prepare, Scan, Validate, Dedup, Prove) that prepares a target codebase, scans for possible weaknesses, validates findings through a separate set of debating agents, removes duplicates, and where possible proves a flaw is exploitable by generating triggering inputs.

A different bet on agentic security

Microsoft's framing is notable: the company is not pitching a single "super-model" but a harness around many of them. "The harness does the work, and the model is one input," said Taesoo Kim, Microsoft's Vice President of Agentic Security, in the company's blog post. The implication is that vulnerability research benefits more from coordinated reasoning across diverse models than from any single frontier system.

That positioning lands at a pointed moment. Coverage in GeekWire and Neowin reports that MDASH outscored Anthropic's Mythos model — the cybersecurity-focused system Anthropic has been gating to a narrow set of enterprise partners — and OpenAI's GPT-5.5 on the same benchmark.

Benchmark dominance

On the public CyberGym benchmark, which contains 1,507 real-world vulnerability reproduction tasks, Microsoft reported an 88.45% success rate, placing MDASH at the top of the leaderboard with roughly a five-point lead over the next entry. On an internal test harness containing 21 deliberately planted vulnerabilities, MDASH reportedly found all 21 with zero false positives. Retrospective runs against historical Microsoft components hit 96% recall on Common Log File System bugs and 100% on tcpip.sys cases.

What it actually found

Microsoft attributes 16 CVEs in its May 12, 2026 Patch Tuesday to MDASH — four critical-rated and 12 important — spanning components including tcpip.sys, ikeext.dll, http.sys, netlogon.dll, dnsapi.dll, and telnet.exe. Two of the most consequential findings, The Hacker News reported, include a kernel use-after-free reachable via SSRR and an IKEv2 double-free that could enable LocalSystem remote code execution.

Implications

The disclosure sharpens a trend the industry has been circling for months: defenders are starting to ship AI systems whose throughput simply outpaces human researchers. That cuts both ways. The same agentic harnesses that find bugs before attackers do can also, in the wrong hands, accelerate exploitation. For enterprise buyers, MDASH's debut also reframes the conversation around Anthropic's tightly gated Mythos model — suggesting the agentic-security race is not a one-horse field, and that orchestration may matter as much as raw model capability.

Learn AI for Free — FreeAcademy.ai

Take "Prompt Engineering Practice" — a free course with certificate to master the skills behind this story.

More in Tools

Higgsfield launches Supercomputer, a cloud-native AI agent for end-to-end creative production
Tools

Higgsfield launches Supercomputer, a cloud-native AI agent for end-to-end creative production

Higgsfield AI unveiled Supercomputer, a self-learning cloud-native agent that orchestrates frontier LLMs and video models across 40+ tools to execute full creative campaigns from a single prompt.

4 hours ago3 min read
Rivian Rolls Out 'Hey Rivian' AI Assistant in 2026.15 Update With Google Calendar Sync
Tools

Rivian Rolls Out 'Hey Rivian' AI Assistant in 2026.15 Update With Google Calendar Sync

Rivian's AI-powered Rivian Assistant and Unified Intelligence platform begin rolling out to Gen 1 and Gen 2 R1T and R1S owners with active Connect+ subscriptions, marking the EV maker's pivot from software-defined to AI-defined vehicles.

1 day ago2 min read
OpenAI Launches 'Daybreak' Cybersecurity Platform to Find and Fix Bugs Before Attackers Do
Tools

OpenAI Launches 'Daybreak' Cybersecurity Platform to Find and Fix Bugs Before Attackers Do

OpenAI unveiled Daybreak, a security initiative built on GPT-5.5 and a Codex Security agent that scans code, models threats, and validates patches — positioning it head-to-head with Anthropic's Mythos.

3 days ago2 min read