Back to stories
Research

Frontier AI Models Solve an Open Math Problem That Stumped Humans for Years

Michael Ouroumis3 min read
Frontier AI Models Solve an Open Math Problem That Stumped Humans for Years

AI systems have solved an open mathematics problem that had stumped human researchers since 2019.

Epoch AI reported that GPT-5.4 Pro became the first model to clear FrontierMath's open-problem track, solving a conjecture on Ramsey hypergraphs that the original authors had been unable to resolve. Gemini 3.1 Pro and Claude Opus 4.6 subsequently also solved it.

The distinction matters: these are not problems where the solution exists and AI found it faster. These are problems that were genuinely unsolved — by the humans who created them — when the models encountered them.

What Ramsey Hypergraphs Are

Ramsey theory is a branch of combinatorics concerned with conditions under which order must appear in structures that seem chaotic. Hypergraph problems in this space involve understanding how colors, connections, or patterns must emerge across high-dimensional graph structures once certain size or density thresholds are crossed.

The specific 2019 conjecture that AI solved involved predicting the existence or properties of certain Ramsey configurations in hypergraphs. The original authors could not find a proof. Neither could subsequent researchers. FrontierMath — a benchmark specifically designed to contain problems beyond current human solving capacity — had listed it as an open problem.

GPT-5.4 Pro produced a valid solution.

The IQ Trajectory That Makes This Less Surprising

Epoch AI's announcement landed in the same week that researcher Charbel-Raphael Ségerie published a striking data point: in March 2023, Claude had an estimated IQ equivalent of approximately 64 on standardized reasoning tests. Today, Claude Opus 4.6 scores 133 on the Mensa Norway test. GPT-5.2 Thinking scores 141. Gemini 3 Pro reaches 142.

That's a jump from cognitively impaired to gifted in approximately three years. No human population in recorded history has ever improved that fast on standardized cognitive assessments.

The Ramsey hypergraph result fits this trajectory. Models aren't just getting better at producing fluent text — they're getting better at mathematical reasoning, at decomposing novel problems into tractable subproblems, and at generating and verifying proofs. The same week that Claude proved it can do original theoretical physics research, another cluster of frontier models proved they can extend human mathematics.

What FrontierMath Is

FrontierMath is a benchmark developed specifically to stay ahead of AI capability. Standard math benchmarks like MATH and GSM8K were saturated — models were scoring at or near 100% — and stopped measuring meaningful differences between frontier systems.

FrontierMath collects problems from working mathematicians, many of which involve research-level difficulty or genuinely open questions. The open-problem track is its most extreme tier: problems listed there have no known human solution at the time they're added.

The fact that frontier models have now cleared this track doesn't mean AI has solved mathematics. It means the tier of problems that can serve as a meaningful test of frontier AI capability has moved again, further into territory that was previously considered uniquely human.

Implications

The practical significance of AI solving open math problems is still being worked out. Mathematical research doesn't produce immediate products, but it underpins fields ranging from cryptography and materials science to fundamental physics. A system that can advance mathematics could, in principle, accelerate progress across all of those areas.

More immediately, the result updates the timeline for when AI might be considered a genuine research collaborator in formal domains. The cautious view — that AI systems are good at pattern matching but can't do real mathematical reasoning — has become harder to hold. What happened with the Ramsey hypergraph conjecture is closer to genuine mathematical discovery than anything AI systems had previously demonstrated.

How AI Actually Works — Free Book on FreeLibrary

A free book that explains the AI concepts behind the headlines — no jargon, just clarity.

More in Research

ARC-AGI-3 Launches a Harder Challenge: Can AI Learn Like Humans Do?
Research

ARC-AGI-3 Launches a Harder Challenge: Can AI Learn Like Humans Do?

The ARC Prize team has released ARC-AGI-3, a new benchmark that moves beyond static puzzles to test whether AI agents can explore novel environments, learn on the fly, and adapt strategies over time.

1 day ago2 min read
Google's TurboQuant Compresses LLM Memory by 6x With Zero Accuracy Loss
Research

Google's TurboQuant Compresses LLM Memory by 6x With Zero Accuracy Loss

Google unveils TurboQuant, a KV cache compression algorithm that slashes LLM memory usage by 6x and delivers up to 8x speedup — rattling memory chip stocks in the process.

1 day ago2 min read
Karpathy Let an AI Agent Optimize His GPT-2 Training Overnight — It Found Improvements He Had Missed
Research

Karpathy Let an AI Agent Optimize His GPT-2 Training Overnight — It Found Improvements He Had Missed

Andrej Karpathy ran an autonomous AI agent on his GPT-2 training setup for a single night. The agent discovered fine-grained improvements that months of manual tuning had overlooked, prompting Karpathy to argue that researchers must stop being the bottleneck.

4 days ago2 min read