Back to stories
Research

Google DeepMind Unveils 'AI Co-Mathematician' — and It Helps an Oxford Professor Crack an Open Problem

Michael Ouroumis2 min read
Google DeepMind Unveils 'AI Co-Mathematician' — and It Helps an Oxford Professor Crack an Open Problem

Google DeepMind has unveiled "AI co-mathematician," a multi-agent system built on its Gemini models that is designed to work alongside professional mathematicians on open research problems — and it has already helped an Oxford professor settle a question that had sat unresolved for years.

What Google built

Described in a paper from Google DeepMind and Google researchers led by Pushmeet Kohli, the system is not a single chatbot but a hierarchical team of agents. A project coordinator sits at the top; workstream coordinators below it manage tasks such as literature review, library development, and counterexample search; and specialized agents handle the work — a search agent, a coding agent, and Gemini Deep Think acting as a proof verifier. The system runs parallel workstreams, tracks uncertainty, and — crucially — preserves failed attempts rather than discarding them, producing mathematical working documents as it goes. It runs on Gemini 3.1, and reporting likened the design to AI coding environments like Claude Code, bringing agent teams and built-in review cycles to math research.

"The future of Math is mathematicians and AI agents working together," Kohli wrote, calling it "a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics."

A benchmark high — and a real result

On FrontierMath Tier 4 — Epoch AI's hardest tier, with problems it says are "designed to surpass Tier 3 in difficulty, with some potentially remaining unsolved by AI for decades" — the system scored 48%, solving 23 of 48 problems, up from 19% for Gemini 3.1 Pro alone. That is a new high among AI systems evaluated on the tier.

The more striking demonstration was human-AI collaboration. Marc Lackenby, a mathematician at Oxford, used the system on an open problem from the Kourovka Notebook (Problem 21.10, in group theory). The AI's first proof attempt contained a flaw — but a reviewer agent flagged it, and in working through the rejected output Lackenby spotted what he called a "really, really clever proof strategy" and realized he knew how to fill the gap. "The system works best when the user is familiar with the area," he noted.

Why it matters

The release lands amid a run of AI-and-mathematics moments. Fields Medalist Timothy Gowers recently reported that ChatGPT 5.5 Pro improved bounds on open number-theory problems posed by Melvyn Nathanson in under two hours, with one collaborator calling the model's key idea "completely original." DeepMind's framing is deliberately collaborative: an assistant that surfaces literature, runs computations, drafts proofs, and verifies them — leaving the mathematician to steer.

For now, AI co-mathematician is in a limited initial release detailed in an arXiv paper; Google says it wants to build products offering broader access. The bigger question it raises is the one running through every research field this year: as agentic systems get good enough to contribute genuinely original ideas, what happens to the training pipeline for the next generation of experts?

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Research

AI Agents Can Self-Replicate Across Networks: Palisade Study Shows 81% Success Rate
Research

AI Agents Can Self-Replicate Across Networks: Palisade Study Shows 81% Success Rate

Palisade Research demonstrates frontier AI agents can autonomously hack vulnerable servers, copy themselves, and form replication chains. Success rates jumped from 6% to 81% in a single year.

1 day ago3 min read
Anthropic's Natural Language Autoencoders Turn Claude's Internal Activations Into Plain English
Research

Anthropic's Natural Language Autoencoders Turn Claude's Internal Activations Into Plain English

Anthropic published Natural Language Autoencoders, a new interpretability technique that converts Claude's internal activations directly into readable text and surfaces evaluation awareness models don't admit out loud.

1 day ago2 min read
Anthropic's 'Teaching Claude Why' Research Brings Agentic Misalignment to Zero
Research

Anthropic's 'Teaching Claude Why' Research Brings Agentic Misalignment to Zero

Anthropic published new alignment research showing that explaining the reasoning behind ethical behavior — not just demonstrating it — drove Claude's blackmail and sabotage rates from up to 96% in Opus 4 down to zero in current models.

2 days ago2 min read