Claude Opus 4.7 Is Here: +13% Coding, 3× Vision Gains, and a New Performance Ceiling

Anthropic releases Claude Opus 4.7 today with 87.6% on SWE-bench Verified, 70% on CursorBench, and 98.5% visual acuity — taking the top spot on agentic coding benchmarks ahead of GPT-5.4 and Gemini 3.1 Pro.

Dr. Nova Chen★Apr 18, 2026★6 min read

Anthropic Ships Claude Opus 4.7 — And the Benchmarks Are Striking

Released today, April 16, 2026, Claude Opus 4.7 is Anthropic's most capable generally available model — and it arrives with benchmark numbers that demand attention. On agentic coding, vision, and graduate-level reasoning, Opus 4.7 establishes new performance ceilings for the publicly available frontier, pulling ahead of GPT-5.4 and Gemini 3.1 Pro across the metrics that matter most for real-world AI deployment.

Opus 4.7 is available right now via the Claude API (claude-opus-4-7), Claude.ai (Pro and Max plans), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing is unchanged from Opus 4.6: $5 per million input tokens and $25 per million output tokens, with up to 90% savings via prompt caching and 50% via batch processing.

Benchmark Breakdown: Where Opus 4.7 Leads

Agentic Coding

The headline numbers are on software engineering benchmarks — the domain where Anthropic has most aggressively pushed Opus 4.7's capabilities:

|-----------|----------|----------|---------|----------------|

| SWE-bench Verified | 87.6% | 80.8% | — | 80.6% |

| SWE-bench Pro | 64.3% | 53.4% | 57.7% | 54.2% |

| CursorBench | 70% | 58% | — | — |

The SWE-bench Pro jump from 53.4% to 64.3% represents a +10.9 percentage point improvement over Opus 4.6 — and a clear lead over GPT-5.4's 57.7%. In practical terms, Anthropic reports that Opus 4.7 resolves 3× more production tasks than Opus 4.6 on real software engineering workflows. For teams using Claude as a coding assistant or agentic programming tool, that is not an incremental gain — it is a step change in what the model can accomplish autonomously.

CursorBench — which measures model performance specifically as a coding assistant operating inside a real IDE environment — rose from 58% to 70%, a +12 point improvement that positions Opus 4.7 as the leading model for IDE-integrated development workflows.

Vision: From 54.5% to 98.5%

The vision improvement in Opus 4.7 is the most dramatic single-metric jump in the release. Visual acuity for computer use rose from 54.5% to 98.5%. That is not a refinement; that is a near-total transformation of the model's visual capability.

Supporting this, Opus 4.7 now processes images at up to 2,576 pixels on the long edge (3.75+ megapixels), enabling detailed analysis of high-resolution technical documents, screenshots, schematics, and visual data. Document reasoning (OfficeQA Pro) improved by 21% fewer errors compared to Opus 4.6.

Reasoning and Knowledge Work

|-----------|----------|-------------|----------------|

| GPQA Diamond | 94.2% | 94.4% | 94.3% |

| ARC-AGI-2 | 77.1% | — | — |

On GPQA Diamond — the graduate-level science and reasoning benchmark — Opus 4.7 reaches 94.2%, sitting within 0.2 points of GPT-5.4 Pro and ahead of Gemini 3.1 Pro. The ARC-AGI-2 score of 77.1% demonstrates strong performance on novel pattern recognition tasks designed to resist memorization and require genuine generalization.

New Features in Opus 4.7

1M Context Window — No Premium

Opus 4.7 includes the full 1 million token context window at standard per-token pricing. There is no long-context surcharge — a 900K-token request costs the same rate as a 9K-token request. For teams running long agentic sessions, large codebase analyses, or extended document processing pipelines, this removes a pricing consideration that previously required careful context management.

Note: Opus 4.7 uses a new tokenizer that may consume up to 35% more tokens for equivalent text compared to previous Claude models — factor this into cost estimates for existing prompts.

xHigh Effort Level

A new "xhigh" effort level sits between "high" and "max," giving developers finer-grained control over the reasoning depth vs. latency tradeoff. This is particularly useful for applications where "max" is overkill for most queries but "high" leaves performance on the table for complex ones.

Adaptive Thinking

Opus 4.7 includes adaptive thinking mode: the model dynamically determines how much internal reasoning to apply based on task complexity. Simple queries get fast responses; complex multi-step reasoning tasks get appropriately deep deliberation — automatically, without developer tuning.

Task Budgets (Public Beta)

A new task budgets feature — now in public beta — allows developers to set token spend guidance for long-running agentic tasks. Rather than letting open-ended agents run without cost constraints, task budgets give the model a target envelope to work within, improving predictability for production deployments.

/ultrareview Command

Claude Code users on Pro and Max plans gain a new /ultrareview slash command — a dedicated, high-intensity code review session that applies Opus 4.7's full capabilities to systematic review of a codebase, PR, or file. It is distinct from inline suggestions: /ultrareview is a structured session designed for thorough pre-merge or pre-deployment analysis.

Improved Memory for Agentic Workflows

Opus 4.7 shows measurable improvements in reading from and writing to file-system-based memory across sessions — a critical capability for long-running agents that maintain state, carry context forward, and build on prior work without re-processing.

Safety and the Mythos Context

Anthropic maintains that Opus 4.7's safety profile is comparable to Opus 4.6, with improvements in honesty and prompt injection resistance. Anthropic's own assessment: the model is "largely well-aligned and trustworthy, though not fully ideal in its behavior."

Opus 4.7 is Anthropic's most capable publicly available model, but it still trails Claude Mythos Preview internally — the model powering Project Glasswing's zero-day vulnerability research, available only to vetted security partners. Opus 4.7 is the frontier for general deployment; Mythos Preview remains in a separate, safety-gated category.

The Competitive Landscape on April 16, 2026

With Opus 4.7 entering the market today, all four major frontier AI labs — Anthropic, OpenAI, Google, and Meta — now have capable models competing for the same enterprise and developer workloads. The competition is producing fast, meaningful improvements across every dimension of capability. For teams evaluating frontier AI: on agentic coding and vision tasks in particular, the April 16 numbers make a strong case that Opus 4.7 sets the current performance standard.

Sources: Anthropic (April 16, 2026), 9to5Mac (April 16, 2026), OfficeChai (April 16, 2026), APIdog (April 16, 2026), GitHub Changelog (April 16, 2026), CNBC (April 16, 2026)