Back to stories
Research

DeepMind's AlphaProof Nexus Cracks 9 Open Erdős Problems With Lean-Verified Proofs

Michael Ouroumis2 min read
DeepMind's AlphaProof Nexus Cracks 9 Open Erdős Problems With Lean-Verified Proofs

Google DeepMind's AlphaProof Nexus has generated machine-verified proofs for 9 of 353 open problems in the Erdős catalog — two of which had stood unsolved for 56 years — alongside 44 of 492 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS). According to the team's arXiv preprint (2605.22763), posted May 21, each solution cost only a few hundred dollars in inference, and every proof was checked in the Lean formal proof assistant rather than left to human referees.

Gemini 3.1 Pro inside a Lean verification loop

The system pairs Gemini 3.1 Pro with Lean, compiling each candidate argument into a machine-checkable formal proof. That design choice is the headline for practitioners: instead of emitting natural-language arguments that need expensive expert review — and that can hide subtle errors — the agent only counts a problem as solved once Lean confirms the proof type-checks. DeepMind has published the Lean artifacts and selected natural-language write-ups in a public GitHub repository, making the results independently reproducible.

DeepMind veteran David Silver — who led the lab's reinforcement-learning research through January 2026 — has framed mathematics as an ideal proving ground for this approach because it is fully digital, self-verifying, and amenable to experience-driven improvement loops — the formal checker supplies an unambiguous reward signal that most agentic domains lack.

A pointed contrast with OpenAI

The release landed roughly a day after OpenAI publicized its own Erdős result, and the framing was deliberate. OpenAI's earlier claim drew criticism that its model had surfaced existing references to already-solved problems rather than constructing anything new; Demis Hassabis reportedly called that episode "embarrassing." DeepMind's pitch is that Lean-checked novelty removes the ambiguity — a proof either compiles or it does not.

Why the verification loop matters for builders

The technical lesson generalizes well beyond number theory. The bottleneck in agentic reasoning is rarely generating plausible output; it is trusting it. AlphaProof Nexus shows what happens when you wrap a frontier model in a hard, automated verifier: the system can grind through hundreds of attempts, discard the ones that fail the checker, and ship only what is provably correct — at a cost low enough, a few hundred dollars per result, to run at scale.

For teams building agents in any domain with a formal oracle — theorem provers, type systems, test suites, constraint solvers, query planners — the takeaway is that verifier-in-the-loop architectures are now producing research-grade output, not toy demos. The economics reframe inference spend, too: a few hundred dollars to settle a 56-year-old open problem is a strong argument for pointing compute at verifiable reasoning rather than open-ended generation.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Research

Revoked Google API Keys Keep Working for 23 Minutes, Aikido Finds
Research

Revoked Google API Keys Keep Working for 23 Minutes, Aikido Finds

Security firm Aikido found that revoked Google API keys can keep authenticating for up to 23 minutes, letting attackers exfiltrate Gemini files and cached conversation data after the key is supposedly killed.

12 hours ago2 min read
METR: Frontier Labs' Internal Agents Could Already Launch Small 'Rogue Deployments'
Research

METR: Frontier Labs' Internal Agents Could Already Launch Small 'Rogue Deployments'

METR's first Frontier Risk Report finds the internal agents at Anthropic, Google, Meta, and OpenAI could already initiate small 'rogue deployments' but can't yet sustain them — and that a large fraction of agent activity goes unreviewed by any human.

13 hours ago2 min read
ByteDance's MMProLong Recipe Hits 128K Context on a 7B VLM for 2,900 GPU-Hours — and Q&A Beats OCR
Research

ByteDance's MMProLong Recipe Hits 128K Context on a 7B VLM for 2,900 GPU-Hours — and Q&A Beats OCR

A ByteDance Seed and HKUST paper extends Qwen2.5-VL-7B from 32K to 128K context on a 5B-token budget and shows long-document VQA training beats OCR transcription, with the 7B model outscoring 27B–38B open rivals.

15 hours ago2 min read