Skip to main content
The Quantum Dispatch
Back to Home
Cover illustration for Google Gemini 3.1 Pro Doubles Reasoning Performance With a New Three-Tier Thinking System

Google Gemini 3.1 Pro Doubles Reasoning Performance With a New Three-Tier Thinking System

Google DeepMind’s Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, more than doubling its predecessor’s reasoning with a three-tier thinking architecture.

Dr. Nova Chen
Dr. Nova ChenFeb 25, 20265 min read

Google DeepMind just raised the bar on what a large language model can do. Gemini 3.1 Pro, released on February 19, is the company’s most advanced model for complex reasoning tasks — and the benchmark numbers speak for themselves.

A Leap in AI Reasoning Performance

On the ARC-AGI-2 benchmark — a rigorous evaluation that measures a model’s ability to solve entirely new logic patterns it has never seen before — Gemini 3.1 Pro scored a verified 77.1%. That is more than double the reasoning performance of its predecessor, Gemini 3 Pro, marking one of the largest single-generation jumps in LLM reasoning capability we have seen to date.

This is not just an incremental improvement. ARC-AGI-2 specifically tests the kind of abstract reasoning that has historically separated human cognition from machine pattern matching. A score this high suggests that Gemini 3.1 Pro is beginning to approach genuinely novel problem-solving territory.

The Three-Tier Thinking Architecture

Perhaps the most interesting technical innovation is the introduction of a three-tier thinking system. Previous Gemini versions operated with a binary approach — either low or high computational effort. Gemini 3.1 Pro adds a Medium parameter, giving developers fine-grained control over the trade-off between response latency and reasoning depth.

In practice, this means developers can dial in exactly how much compute they want the model to spend on a given task. Quick factual lookups get the fast path; multi-step mathematical proofs get the full reasoning engine. It is a smart architectural choice that addresses one of the biggest pain points with reasoning-heavy models: cost efficiency.

Massive Context Window and Output Capacity

The raw specifications are equally impressive. Gemini 3.1 Pro supports a 1-million-token input context window and can generate up to 65,536 tokens of output — roughly 50,000 words in a single response. For multimodal processing, the model handles up to 900 images per prompt, 8.4 hours of continuous audio, and one hour of video.

These numbers position Gemini 3.1 Pro as the model to beat for enterprise workloads that involve processing large documents, codebases, or multimedia archives.

Where Developers Can Access Gemini 3.1 Pro

Gemini 3.1 Pro is rolling out now across the Gemini app for AI Pro and Ultra subscribers, the Gemini API via AI Studio and Vertex AI, GitHub Copilot, NotebookLM, Gemini CLI, and Android Studio. Enterprise users get access through Gemini Enterprise, with full support for long-horizon agentic workflows.

The AI reasoning race just got significantly more competitive — and developers everywhere are the ones who benefit.

Sources: Google AI Blog, February 2026; Google DeepMind Model Card, February 2026; 9to5Google, February 19, 2026; Android Headlines, February 2026