Claude Opus 4.7 vs Opus 4.6: What Actually Changed?
Claude Opus 4.7 delivers major upgrades in coding, vision, and instruction precision. Learn how it compares to previous models, what changed, and why developers must rethink their prompts before upgrading.
Anthropic released Claude Opus 4.7 on April 16, 2026. It is the third Opus release in six months, which could make it easy to overlook. It should not be.
This is not a minor patch. The coding scores jumped hard. Vision resolution tripled. Instruction following became strict enough that prompts written for Opus 4.6 will break.
These are real changes that affect how developers build with the model and what non-technical users can do with it. Some of it is better. Some of it requires work. All of it is worth understanding before you decide whether to upgrade.
What Is Claude Opus 4.7 ?
Opus 4.7 is a direct upgrade to Opus 4.6, continuing Anthropic's roughly two-month release cadence. It handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and verifies its own outputs before reporting back.
Think of it less like a new car and more like the same car with a rebuilt engine, better headlights, and a GPS that actually works. The model you knew is still there. It just does the hard parts better.
The update brings a 13% lift on coding benchmarks, 3x more production tasks resolved, high-resolution vision support up to 3.75 megapixels, and a new tokenizer. Pricing stays the same as Opus 4.6 - $5 per million input tokens and $25 per million output tokens.
It is available today across Claude.ai, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The model ID is claude-opus-4-7.
The Benchmark Comparison
Benchmark Comparison
Benchmarks only tell part of the story. But this model's numbers are strong enough to be worth laying out clearly.
Coding
On SWE-bench Pro, Opus 4.7 scores 64.3%, up from 53.4% on Opus 4.6 and well ahead of GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%. On SWE-bench Verified, the score is 87.6%, compared with 80.8% for its predecessor and 80.6% for Gemini 3.1 Pro.
On CursorBench, Opus 4.7 scores 70%, up 12 points over Opus 4.6's 58%. For developers who work inside Cursor or Claude Code every day, this is the number that matters most.
Vision
visual navigation benchmark
The biggest single-benchmark improvement in the full table is vision. Opus 4.7 scores 79.5% on visual navigation without tools versus 57.7% for Opus 4.6.One early-access partner testing computer vision for autonomous workflows saw visual acuity jump from 54.5% to 98.5%.
Graduate-Level Reasoning
On GPQA Diamond, which measures graduate-level reasoning, the field has converged. Opus 4.7 scores 94.2%, GPT-5.4 Pro scores 94.4%, and Gemini 3.1 Pro scores 94.3%. The differences are within noise. At this level, raw reasoning scores no longer separate the models. What matters is what a model does with that reasoning across long, messy, real-world tasks.
Where It Falls Short
Two areas are honest regressions worth knowing about. Terminal-Bench 2.0 is one: GPT-5.4 scores 75.1% there versus Opus 4.7's 69.4%. BrowseComp also softens compared to Opus 4.6. If terminal automation or agentic web search is your primary use case, this is worth testing before you fully migrate.
The Context
These results put Opus 4.7 ahead of Opus 4.6, GPT-5.4, and Gemini 3.1 Pro, but behind the more broadly capable Claude Mythos Preview. However, Mythos isn't generally available, Anthropic is sharing it only with key platform partners. Opus 4.7 is the best model you can actually use today.
Instruction Following: The Change That Will Break Your Old Prompts
This is the quietest change in Opus 4.7 and the one most likely to cause problems in production.
Opus 4.7 is noticeably more literal in following instructions, a behavior shift that will break prompts tuned for 4.6. Where Opus 4.6 interpreted instructions loosely and sometimes skipped steps, Opus 4.7 takes them precisely.
Here is what that looks like in real terms. If your prompt says "respond in JSON," previous models might add a prose preamble before the JSON block. Opus 4.7 returns JSON and nothing else. If your prompt says "write exactly 3 functions," Opus 4.7 writes exactly 3, even if 4 would be more elegant.
This behavior is better for production systems. It makes outputs predictable. But any prompt written for an earlier Claude model that relied on the model's judgment to fill in gaps will need to be reviewed and rewritten.
Anthropic explicitly warns developers to re-tune their prompts. Do not skip this step.
Trying Opus 4.7
To see how Opus 4.7 handles a creative technical task, I gave it this prompt:
Prompt
Output:
It returned a single working file on the first attempt. The planet, the trails, the particle explosions, all there, exactly as described. Nothing was approximated or skipped. The code was clean enough to use without editing.
New Features in Claude Opus 4.7
New Features
xhigh Effort Level
The effort parameter now includes an xhigh level, sitting above the existing high, medium, and low levels. At xhigh, the model spends significantly more tokens on internal reasoning, resulting in better outputs for complex problems. In Claude Code, Anthropic has set xhigh as the default effort level for all plans.
Real-world costs can rise 10-40% at equivalent workloads when using xhigh effort mode, due to the new tokenizer and increased output tokens. But the quality gains on hard coding and analytical tasks are real.
Task Budgets (Public Beta)
Task budgets solve a problem that anyone building agents has hit: how do you prevent a multi-turn agentic loop from consuming an unbounded number of tokens? With task budgets, you give Claude a rough token target for the entire loop. The model sees a running countdown and uses it to prioritize work, skip low-value steps, and finish gracefully as the budget runs out.
This is a meaningful step toward agents that behave sensibly in production, rather than running until they run out of context.
"/ultrareview" in Claude Code
Claude Code gains the new /ultrareview command, which runs a dedicated review session that reads through your changes and flags what a careful reviewer would catch. Anthropic is giving Pro and Max Claude Code users three free ultrareviews to try it. Auto mode, which lets Claude make decisions on your behalf during long tasks, is also now available to Max plan subscribers.
What About Safety?
Opus 4.7 arrives at a moment when Anthropic is thinking carefully about where AI capabilities should and should not go. Anthropic has launched a Cyber Verification Program that allows security professionals to apply for exceptions to the built-in safeguards. The model is the first Claude release with production cybersecurity safeguards, tested here before eventually rolling out to Mythos-class models.
On the broader alignment picture, the official system card states the model is "largely well-aligned and trustworthy, though not fully ideal in its behavior." Low rates of deception and sycophancy, and strong resistance to prompt injection attacks. Claude Mythos Preview still leads on alignment scores, but Opus 4.7 is a solid step in the right direction.
What It Costs and How to Migrate
The pricing is unchanged from Opus 4.6. Developers pay $5 per million input tokens and $25 per million output tokens. The model ID is claude-opus-4-7.
The main migration consideration is token usage. The new tokenizer can map the same input to 1.0-1.35× more tokens than Opus 4.6, and xhigh effort mode uses substantially more output tokens. Anthropic recommends measuring token usage on real traffic before migrating at scale.
Box's evaluations found that Opus 4.7 had a 56% reduction in model calls and a 50% reduction in tool calls compared to Opus 4.6 on certain workflows, meaning the model often gets tasks done in fewer steps, which can offset the higher per-token count.
The upgrade path is clear: change your model identifier, audit your prompts for loose instruction language, and monitor token usage for the first few days. Anthropic has published a migration guide with more detail.
Conclusion
Claude Opus 4.7 is not a revolution. It is something more useful than that right now, a reliable, measurable improvement in the areas that cost teams the most time: complex coding, long multi-step tasks, and working with rich visual content.
The benchmark wins are real. The instruction-following change is real and will require work. The vision upgrade opens use cases that were genuinely not possible before.
For Anthropic, the release reinforces the position that has driven its extraordinary revenue growth. Claude is the model that developers and enterprises reach for when they need reliable, high-quality output on complex work. Opus 4.7 extends that lead at a moment when the company's commercial trajectory depends on it. The competition is close, and closing. But for now, on the tasks that generate the most revenue, Anthropic has the best model on the market.
FAQs
Q1. What makes Claude Opus 4.7 different from previous versions?
Claude Opus 4.7 brings significant improvements in coding performance, instruction following, and high-resolution vision capabilities, making it more reliable for complex, real-world tasks.
Q2. Do I need to update my prompts for Opus 4.7?
Yes, Opus 4.7 follows instructions much more strictly, so prompts designed for earlier versions may break and should be rewritten for precise outputs.
Q3. Is Claude Opus 4.7 worth upgrading for developers?
Yes, especially for developers working on coding-heavy or multi-step tasks, as it delivers higher accuracy, fewer iterations, and better task completion.
Simplify Your Data Annotation Workflow With Proven Strategies