Back to stories
Models

Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Michael Ouroumis2 min read
Google Launches Gemini 3.1 Pro With Double the Reasoning Performance

Google has announced Gemini 3.1 Pro, a major update to its flagship AI model that more than doubles reasoning performance compared to the previous generation. The model scored 77.1% on ARC-AGI-2, a benchmark specifically designed to test abstract reasoning capabilities.

The Numbers

The ARC-AGI-2 benchmark measures an AI model's ability to solve novel reasoning tasks that require genuine abstraction rather than pattern matching from training data. Gemini 3.1 Pro's 77.1% score represents a significant jump from Gemini 3 Pro's results, indicating real progress in the model's ability to reason about unfamiliar problems.

This performance places Gemini 3.1 Pro among the top-performing models on what many researchers consider the most challenging reasoning benchmark available today.

Where It's Available

Google is rolling out Gemini 3.1 Pro across its entire AI platform ecosystem:

What's Improved

Beyond the headline reasoning benchmark, Gemini 3.1 Pro shows improvements across several areas:

Mathematical Reasoning

The model handles multi-step mathematical proofs and calculations with greater reliability, reducing the error rate on complex derivations.

Code Understanding

Gemini 3.1 Pro demonstrates stronger ability to reason about code behavior, identify bugs through logical analysis, and suggest fixes that address root causes rather than symptoms.

Long-Context Reasoning

The model maintains coherent reasoning across longer contexts, making it more effective for tasks that require synthesizing information from large documents or codebases.

Implications for the Model Race

Google's announcement intensifies the competition among frontier AI labs. The focus on reasoning performance reflects a broader industry trend — raw language fluency is largely solved, and the differentiator is now how well models can think through complex, novel problems.

With OpenAI's GPT-5, Claude's legal reasoning dominance, and now Gemini 3.1 Pro all pushing hard on reasoning capabilities, the pace of improvement shows no signs of slowing down. For a detailed comparison of all three, see this ChatGPT vs Claude vs Gemini guide.

More in Models

Microsoft Releases Phi-4-Reasoning-Vision-15B: A Small Model That Knows When to Think
Models

Microsoft Releases Phi-4-Reasoning-Vision-15B: A Small Model That Knows When to Think

Microsoft open-sources Phi-4-reasoning-vision-15B, a compact 15B-parameter multimodal model that selectively activates chain-of-thought reasoning and rivals models many times its size.

8 hours ago2 min read
Anthropic Releases Claude Opus 4.6 — Its Most Capable Agentic Coding Model
Models

Anthropic Releases Claude Opus 4.6 — Its Most Capable Agentic Coding Model

Anthropic launches Claude Opus 4.6, a frontier model purpose-built for autonomous coding agents that can plan, execute, and debug multi-file projects with minimal human oversight.

1 day ago2 min read
Meta Releases Llama 4 Maverick With 400B Parameters Under Open Weights
Models

Meta Releases Llama 4 Maverick With 400B Parameters Under Open Weights

Meta releases Llama 4 Maverick, a 400-billion parameter mixture-of-experts model under its open weights license, matching GPT-5 on key benchmarks and reigniting the open-source AI debate.

1 day ago2 min read