Google DeepMind has unveiled "AI co-mathematician," a multi-agent system built on its Gemini models that is designed to work alongside professional mathematicians on open research problems — and it has already helped an Oxford professor settle a question that had sat unresolved for years.
What Google built
Described in a paper from Google DeepMind and Google researchers led by Pushmeet Kohli, the system is not a single chatbot but a hierarchical team of agents. A project coordinator sits at the top; workstream coordinators below it manage tasks such as literature review, library development, and counterexample search; and specialized agents handle the work — a search agent, a coding agent, and Gemini Deep Think acting as a proof verifier. The system runs parallel workstreams, tracks uncertainty, and — crucially — preserves failed attempts rather than discarding them, producing mathematical working documents as it goes. It runs on Gemini 3.1, and reporting likened the design to AI coding environments like Claude Code, bringing agent teams and built-in review cycles to math research.
"The future of Math is mathematicians and AI agents working together," Kohli wrote, calling it "a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics."
A benchmark high — and a real result
On FrontierMath Tier 4 — Epoch AI's hardest tier, with problems it says are "designed to surpass Tier 3 in difficulty, with some potentially remaining unsolved by AI for decades" — the system scored 48%, solving 23 of 48 problems, up from 19% for Gemini 3.1 Pro alone. That is a new high among AI systems evaluated on the tier.
The more striking demonstration was human-AI collaboration. Marc Lackenby, a mathematician at Oxford, used the system on an open problem from the Kourovka Notebook (Problem 21.10, in group theory). The AI's first proof attempt contained a flaw — but a reviewer agent flagged it, and in working through the rejected output Lackenby spotted what he called a "really, really clever proof strategy" and realized he knew how to fill the gap. "The system works best when the user is familiar with the area," he noted.
Why it matters
The release lands amid a run of AI-and-mathematics moments. Fields Medalist Timothy Gowers recently reported that ChatGPT 5.5 Pro improved bounds on open number-theory problems posed by Melvyn Nathanson in under two hours, with one collaborator calling the model's key idea "completely original." DeepMind's framing is deliberately collaborative: an assistant that surfaces literature, runs computations, drafts proofs, and verifies them — leaving the mathematician to steer.
For now, AI co-mathematician is in a limited initial release detailed in an arXiv paper; Google says it wants to build products offering broader access. The bigger question it raises is the one running through every research field this year: as agentic systems get good enough to contribute genuinely original ideas, what happens to the training pipeline for the next generation of experts?



