Back to stories
Models

Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Michael Ouroumis2 min read
Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Google has announced Gemini 3.1 Pro, a major update to its flagship AI model that more than doubles reasoning performance compared to the previous generation. The model scored 77.1% on ARC-AGI-2, a benchmark specifically designed to test abstract reasoning capabilities.

The Numbers

The ARC-AGI-2 benchmark measures an AI model's ability to solve novel reasoning tasks that require genuine abstraction rather than pattern matching from training data. Gemini 3.1 Pro's 77.1% score represents a significant jump from Gemini 3 Pro's results, indicating real progress in the model's ability to reason about unfamiliar problems.

This performance places Gemini 3.1 Pro among the top-performing models on what many researchers consider the most challenging reasoning benchmark available today.

Where It's Available

Google is rolling out Gemini 3.1 Pro across its entire AI platform ecosystem:

What's Improved

Beyond the headline reasoning benchmark, Gemini 3.1 Pro shows improvements across several areas:

Mathematical Reasoning

The model handles multi-step mathematical proofs and calculations with greater reliability, reducing the error rate on complex derivations.

Code Understanding

Gemini 3.1 Pro demonstrates stronger ability to reason about code behavior, identify bugs through logical analysis, and suggest fixes that address root causes rather than symptoms.

Long-Context Reasoning

The model maintains coherent reasoning across longer contexts, making it more effective for tasks that require synthesizing information from large documents or codebases.

Implications for the Model Race

Google's announcement intensifies the competition among frontier AI labs. The focus on reasoning performance reflects a broader industry trend — raw language fluency is largely solved, and the differentiator is now how well models can think through complex, novel problems.

With OpenAI's GPT-5, Claude's legal reasoning dominance, and now Gemini 3.1 Pro all pushing hard on reasoning capabilities, the pace of improvement shows no signs of slowing down. For a detailed comparison of all three, see this ChatGPT vs Claude vs Gemini guide.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

Moonshot Kimi K2.6 lands open-source, scales to 300 sub-agents and 4,000 coordinated steps
Models

Moonshot Kimi K2.6 lands open-source, scales to 300 sub-agents and 4,000 coordinated steps

Moonshot AI shipped Kimi K2.6 as a generally available open-source model on April 20, posting 58.6 on SWE-Bench Pro — ahead of GPT-5.4 and Claude Opus 4.6 — while scaling agent swarms to 300 sub-agents and 4,000 coordinated steps.

15 hours ago3 min read
OpenAI's 'Spud' Caught Live in API Testing, Polymarket Jumps to 81% for April 23 Launch
Models

OpenAI's 'Spud' Caught Live in API Testing, Polymarket Jumps to 81% for April 23 Launch

API monitors detected OpenAI's next frontier model — codenamed Spud — running in production-scale testing on April 19, sending Polymarket traders to an 81% implied probability of a public launch on April 23.

1 day ago2 min read
OpenAI Launches GPT-Rosalind, Its First Domain-Specific Model Built for Life Sciences
Models

OpenAI Launches GPT-Rosalind, Its First Domain-Specific Model Built for Life Sciences

OpenAI debuts GPT-Rosalind, a specialized AI model for biology, drug discovery, and genomics, with launch partners including Amgen, Moderna, and Los Alamos National Laboratory.

4 days ago2 min read