What is Google's AI co-mathematician?

It's a multi-agent AI system from Google DeepMind, built on Gemini 3.1, that collaborates with human mathematicians on open research problems — searching literature, running code, drafting proofs, and verifying them in a stateful workbench rather than a single chat.

How did it score on the FrontierMath benchmark?

It scored 48% on FrontierMath Tier 4, Epoch AI's hardest tier, solving 23 of 48 problems — more than double the 19% raw score of Gemini 3.1 Pro on its own, and a new high among AI systems evaluated on that tier.

Can anyone use the AI co-mathematician yet?

Not yet. Google describes it as a limited initial release detailed in an arXiv paper, and says it intends to develop future products that offer broader access.

Google DeepMind Unveils 'AI Co-Mathematician' — and It Helps an Oxford Professor Crack an Open Problem

Google DeepMind has unveiled "AI co-mathematician," a multi-agent system built on its Gemini models that is designed to work alongside professional mathematicians on open research problems — and it has already helped an Oxford professor settle a question that had sat unresolved for years.

What Google built

Described in a paper from Google DeepMind and Google researchers led by Pushmeet Kohli, the system is not a single chatbot but a hierarchical team of agents. A project coordinator sits at the top; workstream coordinators below it manage tasks such as literature review, library development, and counterexample search; and specialized agents handle the work — a search agent, a coding agent, and Gemini Deep Think acting as a proof verifier. The system runs parallel workstreams, tracks uncertainty, and — crucially — preserves failed attempts rather than discarding them, producing mathematical working documents as it goes. It runs on Gemini 3.1, and reporting likened the design to AI coding environments like Claude Code, bringing agent teams and built-in review cycles to math research.

"The future of Math is mathematicians and AI agents working together," Kohli wrote, calling it "a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics."

A benchmark high — and a real result

On FrontierMath Tier 4 — Epoch AI's hardest tier, with problems it says are "designed to surpass Tier 3 in difficulty, with some potentially remaining unsolved by AI for decades" — the system scored 48%, solving 23 of 48 problems, up from 19% for Gemini 3.1 Pro alone. That is a new high among AI systems evaluated on the tier.

The more striking demonstration was human-AI collaboration. Marc Lackenby, a mathematician at Oxford, used the system on an open problem from the Kourovka Notebook (Problem 21.10, in group theory). The AI's first proof attempt contained a flaw — but a reviewer agent flagged it, and in working through the rejected output Lackenby spotted what he called a "really, really clever proof strategy" and realized he knew how to fill the gap. "The system works best when the user is familiar with the area," he noted.

Why it matters

The release lands amid a run of AI-and-mathematics moments. Fields Medalist Timothy Gowers recently reported that ChatGPT 5.5 Pro improved bounds on open number-theory problems posed by Melvyn Nathanson in under two hours, with one collaborator calling the model's key idea "completely original." DeepMind's framing is deliberately collaborative: an assistant that surfaces literature, runs computations, drafts proofs, and verifies them — leaving the mathematician to steer.

For now, AI co-mathematician is in a limited initial release detailed in an arXiv paper; Google says it wants to build products offering broader access. The bigger question it raises is the one running through every research field this year: as agentic systems get good enough to contribute genuinely original ideas, what happens to the training pipeline for the next generation of experts?

Google DeepMind Unveils 'AI Co-Mathematician' — and It Helps an Oxford Professor Crack an Open Problem

What Google built

A benchmark high — and a real result

Why it matters

More in Research

AI Agents Can Self-Replicate Across Networks: Palisade Study Shows 81% Success Rate

Anthropic's Natural Language Autoencoders Turn Claude's Internal Activations Into Plain English

Anthropic's 'Teaching Claude Why' Research Brings Agentic Misalignment to Zero