What model powers AlphaProof Nexus?

It pairs Gemini 3.1 Pro with the Lean formal proof assistant, which type-checks every generated proof before it counts as solved.

What did it solve, and at what cost?

Machine-verified proofs for 9 of 353 open Erdős problems — two open for 56 years — plus 44 of 492 open OEIS conjectures, at roughly a few hundred dollars of inference per problem.

How does this differ from OpenAI's Erdős claim?

DeepMind's proofs are formally checked in Lean and published as reproducible artifacts, whereas OpenAI's earlier claim was criticized for surfacing references to already-solved problems rather than constructing novel proofs.

DeepMind's AlphaProof Nexus Cracks 9 Open Erdős Problems With Lean-Verified Proofs

Google DeepMind's AlphaProof Nexus has generated machine-verified proofs for 9 of 353 open problems in the Erdős catalog — two of which had stood unsolved for 56 years — alongside 44 of 492 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS). According to the team's arXiv preprint (2605.22763), posted May 21, each solution cost only a few hundred dollars in inference, and every proof was checked in the Lean formal proof assistant rather than left to human referees.

Gemini 3.1 Pro inside a Lean verification loop

The system pairs Gemini 3.1 Pro with Lean, compiling each candidate argument into a machine-checkable formal proof. That design choice is the headline for practitioners: instead of emitting natural-language arguments that need expensive expert review — and that can hide subtle errors — the agent only counts a problem as solved once Lean confirms the proof type-checks. DeepMind has published the Lean artifacts and selected natural-language write-ups in a public GitHub repository, making the results independently reproducible.

DeepMind veteran David Silver — who led the lab's reinforcement-learning research through January 2026 — has framed mathematics as an ideal proving ground for this approach because it is fully digital, self-verifying, and amenable to experience-driven improvement loops — the formal checker supplies an unambiguous reward signal that most agentic domains lack.

A pointed contrast with OpenAI

The release landed roughly a day after OpenAI publicized its own Erdős result, and the framing was deliberate. OpenAI's earlier claim drew criticism that its model had surfaced existing references to already-solved problems rather than constructing anything new; Demis Hassabis reportedly called that episode "embarrassing." DeepMind's pitch is that Lean-checked novelty removes the ambiguity — a proof either compiles or it does not.

Why the verification loop matters for builders

The technical lesson generalizes well beyond number theory. The bottleneck in agentic reasoning is rarely generating plausible output; it is trusting it. AlphaProof Nexus shows what happens when you wrap a frontier model in a hard, automated verifier: the system can grind through hundreds of attempts, discard the ones that fail the checker, and ship only what is provably correct — at a cost low enough, a few hundred dollars per result, to run at scale.

For teams building agents in any domain with a formal oracle — theorem provers, type systems, test suites, constraint solvers, query planners — the takeaway is that verifier-in-the-loop architectures are now producing research-grade output, not toy demos. The economics reframe inference spend, too: a few hundred dollars to settle a 56-year-old open problem is a strong argument for pointing compute at verifiable reasoning rather than open-ended generation.

DeepMind's AlphaProof Nexus Cracks 9 Open Erdős Problems With Lean-Verified Proofs

Gemini 3.1 Pro inside a Lean verification loop

A pointed contrast with OpenAI

Why the verification loop matters for builders

More in Research

Revoked Google API Keys Keep Working for 23 Minutes, Aikido Finds

METR: Frontier Labs' Internal Agents Could Already Launch Small 'Rogue Deployments'

ByteDance's MMProLong Recipe Hits 128K Context on a 7B VLM for 2,900 GPU-Hours — and Q&A Beats OCR