What is Gemini 3.1 Pro's ARC-AGI-2 score?

Gemini 3.1 Pro scored 77.1% on ARC-AGI-2, more than doubling its predecessor's reasoning performance on this benchmark.

Where can I use Gemini 3.1 Pro?

Gemini 3.1 Pro is available on Vertex AI, Google AI Studio, Gemini CLI, and Google Antigravity.

What is improved in Gemini 3.1 Pro?

Gemini 3.1 Pro shows significant improvements in mathematical reasoning, code understanding and bug detection, and long-context reasoning across large documents.

Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Google has announced Gemini 3.1 Pro, a major update to its flagship AI model that more than doubles reasoning performance compared to the previous generation. The model scored 77.1% on ARC-AGI-2, a benchmark specifically designed to test abstract reasoning capabilities.

The Numbers

The ARC-AGI-2 benchmark measures an AI model's ability to solve novel reasoning tasks that require genuine abstraction rather than pattern matching from training data. Gemini 3.1 Pro's 77.1% score represents a significant jump from Gemini 3 Pro's results, indicating real progress in the model's ability to reason about unfamiliar problems.

This performance places Gemini 3.1 Pro among the top-performing models on what many researchers consider the most challenging reasoning benchmark available today.

Where It's Available

Google is rolling out Gemini 3.1 Pro across its entire AI platform ecosystem:

Vertex AI — For enterprise customers building production applications
Google AI Studio — For developers experimenting and prototyping
Gemini CLI — For command-line workflows and automation
Google Antigravity — Google's experimental AI development environment

What's Improved

Beyond the headline reasoning benchmark, Gemini 3.1 Pro shows improvements across several areas:

Mathematical Reasoning

The model handles multi-step mathematical proofs and calculations with greater reliability, reducing the error rate on complex derivations.

Code Understanding

Gemini 3.1 Pro demonstrates stronger ability to reason about code behavior, identify bugs through logical analysis, and suggest fixes that address root causes rather than symptoms.

Long-Context Reasoning

The model maintains coherent reasoning across longer contexts, making it more effective for tasks that require synthesizing information from large documents or codebases.

Implications for the Model Race

Google's announcement intensifies the competition among frontier AI labs. The focus on reasoning performance reflects a broader industry trend — raw language fluency is largely solved, and the differentiator is now how well models can think through complex, novel problems.

With OpenAI's GPT-5, Claude's legal reasoning dominance, and now Gemini 3.1 Pro all pushing hard on reasoning capabilities, the pace of improvement shows no signs of slowing down. For a detailed comparison of all three, see this ChatGPT vs Claude vs Gemini guide.

Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

The Numbers

Where It's Available

What's Improved

Mathematical Reasoning

Code Understanding

Long-Context Reasoning

Implications for the Model Race

More in Models

Moonshot Kimi K2.6 lands open-source, scales to 300 sub-agents and 4,000 coordinated steps

OpenAI's 'Spud' Caught Live in API Testing, Polymarket Jumps to 81% for April 23 Launch

OpenAI Launches GPT-Rosalind, Its First Domain-Specific Model Built for Life Sciences