Google has announced Gemini 3.1 Pro, a major update to its flagship AI model that more than doubles reasoning performance compared to the previous generation. The model scored 77.1% on ARC-AGI-2, a benchmark specifically designed to test abstract reasoning capabilities.
The Numbers
The ARC-AGI-2 benchmark measures an AI model's ability to solve novel reasoning tasks that require genuine abstraction rather than pattern matching from training data. Gemini 3.1 Pro's 77.1% score represents a significant jump from Gemini 3 Pro's results, indicating real progress in the model's ability to reason about unfamiliar problems.
This performance places Gemini 3.1 Pro among the top-performing models on what many researchers consider the most challenging reasoning benchmark available today.
Where It's Available
Google is rolling out Gemini 3.1 Pro across its entire AI platform ecosystem:
- Vertex AI — For enterprise customers building production applications
- Google AI Studio — For developers experimenting and prototyping
- Gemini CLI — For command-line workflows and automation
- Google Antigravity — Google's experimental AI development environment
What's Improved
Beyond the headline reasoning benchmark, Gemini 3.1 Pro shows improvements across several areas:
Mathematical Reasoning
The model handles multi-step mathematical proofs and calculations with greater reliability, reducing the error rate on complex derivations.
Code Understanding
Gemini 3.1 Pro demonstrates stronger ability to reason about code behavior, identify bugs through logical analysis, and suggest fixes that address root causes rather than symptoms.
Long-Context Reasoning
The model maintains coherent reasoning across longer contexts, making it more effective for tasks that require synthesizing information from large documents or codebases.
Implications for the Model Race
Google's announcement intensifies the competition among frontier AI labs. The focus on reasoning performance reflects a broader industry trend — raw language fluency is largely solved, and the differentiator is now how well models can think through complex, novel problems.
With OpenAI's GPT-5, Claude's legal reasoning dominance, and now Gemini 3.1 Pro all pushing hard on reasoning capabilities, the pace of improvement shows no signs of slowing down. For a detailed comparison of all three, see this ChatGPT vs Claude vs Gemini guide.


