What is Microsoft MDASH?

MDASH (Microsoft Security multi-model agentic scanning harness) is an automated vulnerability discovery system that orchestrates more than 100 specialized AI agents across frontier and distilled models to find, validate, and prove exploitable bugs in large codebases like Windows.

How many Windows vulnerabilities did MDASH find?

MDASH discovered 16 Windows CVEs that Microsoft patched in its May 12, 2026 Patch Tuesday release, including four rated critical — among them remote code execution flaws in IKEv2 and TCP/IP — without any human researcher identifying them first.

How does MDASH compare to other AI systems on benchmarks?

On the public CyberGym benchmark of 1,507 real-world vulnerability tasks, MDASH scored 88.45%, the top entry on the leaderboard and roughly five points ahead of the next system, according to Microsoft.

Microsoft's MDASH AI Hunts Windows Bugs — Tops CyberGym Benchmark and Catches 16 Flaws Humans Missed

Microsoft has unveiled MDASH, a multi-model agentic AI system designed to hunt software vulnerabilities at machine speed — and on its debut it quietly found 16 Windows bugs that shipped fixes in this week's Patch Tuesday, including four critical remote code execution flaws.

Announced via Microsoft's Security Blog on May 12, MDASH — short for multi-model agentic scanning harness — orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models. The system runs a five-stage pipeline (Prepare, Scan, Validate, Dedup, Prove) that prepares a target codebase, scans for possible weaknesses, validates findings through a separate set of debating agents, removes duplicates, and where possible proves a flaw is exploitable by generating triggering inputs.

A different bet on agentic security

Microsoft's framing is notable: the company is not pitching a single "super-model" but a harness around many of them. "The harness does the work, and the model is one input," said Taesoo Kim, Microsoft's Vice President of Agentic Security, in the company's blog post. The implication is that vulnerability research benefits more from coordinated reasoning across diverse models than from any single frontier system.

That positioning lands at a pointed moment. Coverage in GeekWire and Neowin reports that MDASH outscored Anthropic's Mythos model — the cybersecurity-focused system Anthropic has been gating to a narrow set of enterprise partners — and OpenAI's GPT-5.5 on the same benchmark.

Benchmark dominance

On the public CyberGym benchmark, which contains 1,507 real-world vulnerability reproduction tasks, Microsoft reported an 88.45% success rate, placing MDASH at the top of the leaderboard with roughly a five-point lead over the next entry. On an internal test harness containing 21 deliberately planted vulnerabilities, MDASH reportedly found all 21 with zero false positives. Retrospective runs against historical Microsoft components hit 96% recall on Common Log File System bugs and 100% on tcpip.sys cases.

What it actually found

Microsoft attributes 16 CVEs in its May 12, 2026 Patch Tuesday to MDASH — four critical-rated and 12 important — spanning components including tcpip.sys, ikeext.dll, http.sys, netlogon.dll, dnsapi.dll, and telnet.exe. Two of the most consequential findings, The Hacker News reported, include a kernel use-after-free reachable via SSRR and an IKEv2 double-free that could enable LocalSystem remote code execution.

Implications

The disclosure sharpens a trend the industry has been circling for months: defenders are starting to ship AI systems whose throughput simply outpaces human researchers. That cuts both ways. The same agentic harnesses that find bugs before attackers do can also, in the wrong hands, accelerate exploitation. For enterprise buyers, MDASH's debut also reframes the conversation around Anthropic's tightly gated Mythos model — suggesting the agentic-security race is not a one-horse field, and that orchestration may matter as much as raw model capability.

Microsoft's MDASH AI Hunts Windows Bugs — Tops CyberGym Benchmark and Catches 16 Flaws Humans Missed

A different bet on agentic security

Benchmark dominance

What it actually found

Implications

More in Tools

Higgsfield launches Supercomputer, a cloud-native AI agent for end-to-end creative production

Rivian Rolls Out 'Hey Rivian' AI Assistant in 2026.15 Update With Google Calendar Sync

OpenAI Launches 'Daybreak' Cybersecurity Platform to Find and Fix Bugs Before Attackers Do