Claude Sonnet 5 vs Sonnet 4.6: Performance, Cost & Features

For months, developers building agents had a choice - pay Opus prices for reliable autonomy, or use Sonnet and babysit the model through every step it struggled to finish.

Claude Sonnet 5 changes that equation. Anthropic released it on June 30, 2026, calling it the most agentic Sonnet model yet, one that can make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.

This is not a checkpoint refresh. It is the first time a Sonnet-class model has genuinely threatened Opus performance on the benchmarks that matter to real agentic workflows.

What Made Sonnet 4.6 Hit Its Ceiling

Sonnet 4.6 launched in February 2026 and was already impressive for its price. It scored 79.6% on SWE-bench Verified for agentic coding, and 72.5% on OSWorld-Verified for computer use. Developers chose it over the previous flagship Opus 4.5 most of the time.

But agentic reliability was where it showed cracks. Sonnet 4.6 could plan a task. It often could not finish one. Complex multi-step jobs stalled halfway. It needed human nudging at points that a truly autonomous agent should handle alone. The clearest gains in agentic capability had shifted to the Opus class, leaving Sonnet users with a frustrating gap between what the model could start and what it could complete.

Sonnet 5 is Anthropic's direct answer to that gap.

The Head-to-Head Prompt: Build the Same UI in Sonnet 5 and Sonnet 4.6

Here is the exact prompt both models received:

Build a single HTML file. A watermelon sits on a wooden post in a field.
A man aims a gun at it and shoots. The watermelon explodes into chunks,
juice, and seeds flying everywhere. Animate the full sequence: aim, shoot,
explosion, debris falling. Pure HTML, CSS, vanilla JS only. No libraries.
Single file. Must run in browser.

No design spec. No color palette. No character description. No animation timing. Every visual decision was left entirely to the model.

Sonnet 5 built an illustrated world

Rolling green hills with grass texture, a gradient sky, volumetric clouds with soft shadows, a glowing sun, and a fully detailed character. red cap, beard, blue jacket, brown boots, rifle with a visible barrel. The watermelon sits on its post with a red pulsing crosshair locked onto it. A styled "Taking aim..." title card appears in bold red. The scene has depth, timing, and personality. Nothing in the prompt asked for any of this.

Sonnet 4.6 built the literal prompt

Flat blue sky, a solid ground block, a geometric stick figure with a hat, rectangular limbs, and a gun-shaped rectangle. The watermelon is present. The post is present. The "Again!" replay button appears fast, because the animation sequence itself was thin. It answered every word in the prompt and invented nothing beyond it.

The gap is not about which model followed instructions better. Both did. The gap is about what the model does with the space the prompt does not fill.

Sonnet 5 treated the silence in the prompt as creative latitude. It added hills because a field needs terrain. It added a crosshair because aim needs a visual. It added character detail because a man with a gun is more interesting than a blue rectangle with a black rectangle attached.

Sonnet 4.6 treated the silence as nothing. It rendered the nouns, man, gun, watermelon, post, field, and stopped exactly where the prompt stopped.

That difference is what Anthropic means when they describe Sonnet 5 as more agentic. It does not just execute. It makes decisions about what a good output looks like before you tell it what good means.

The Architecture Shift: Effort Levels and Agentic Design

Sonnet 5 is not just a better version of Sonnet 4.6. It introduces a structural feature that changes how you deploy it.

Sonnet 5 exposes selectable reasoning effort: low, medium, high, max, and extra-high (xhigh). Higher effort means more tokens spent on reasoning before the model acts.

This is meaningful because agent tasks are not uniform. A simple file rename does not need the same reasoning depth as debugging a race condition across three services.

  cost-performance (BrowseComp)

  cost-performance (OSWorld-Verified)

Source

Sonnet 5 is a strict improvement over Sonnet 4.6 at every effort level on both BrowseComp (agentic web search) and OSWorld-Verified (computer use). Sonnet 4.6 fell well short of Opus 4.8 on these evaluations. Sonnet 5 and Opus 4.8 now sit on a single performance range, Sonnet 5 at lower cost, Opus 4.8 at higher accuracy.

At its maximum extra-high effort setting, Sonnet 5 performs roughly in line with Opus 4.8 at a medium-to-high effort setting on OSWorld and BrowseComp. The catch is cost: running Sonnet 5 at xhigh can exceed Opus 4.8's cost at a comparable accuracy point. The model is a dial now, not a fixed tier.

Sonnet 5 vs Sonnet 4.6: The Numbers

  benchmarks

Source

Every benchmark Anthropic published shows Sonnet 5 ahead of Sonnet 4.6. The gap is real, not marginal.

On agentic coding (SWE-bench Pro), Sonnet 5 scores 63.2% against Sonnet 4.6's 58.1%, a 5-point jump with Opus 4.8 still leading at 69.2%. On computer use (OSWorld-Verified), Sonnet 5 reaches 81.2%, a clear step up from Sonnet 4.6's revised score of 78.5%. On Humanity's Last Exam with tools, Sonnet 5 scores 57.4% versus Sonnet 4.6's 46.8%.

The most striking number is knowledge work. On the GDPval-AA v2 knowledge work benchmark, Sonnet 5 scores 1,618 against Opus 4.8's 1,615, edging past the flagship on this specific evaluation.

How to Use Claude Sonnet 5

Via the API:

import anthropic

client = anthropic.Anthropic(api_key="YOUR_API_KEY")

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=8192,
    messages=[
        {
            "role": "user",
            "content": "Audit this Python module for race conditions and write a reproducing test."
        }
    ]
)
print(response.content[0].text)

With adjustable effort levels:

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # controls reasoning depth
    },
    messages=[
        {
            "role": "user",
            "content": "Refactor this entire authentication module to be async."
        }
    ]
)

In Claude Code (terminal):

# Set Sonnet 5 as active model
claude config set model claude-sonnet-5

# Run with high effort on a complex task
claude --effort high "Fix all TypeScript errors in src/ and add missing types"


Sonnet 5 is available across all Claude plans, it is the default model for Free and Pro, and available to Max, Team, and Enterprise users. It runs on Claude Code, the Claude Platform natively, AWS Bedrock, and Microsoft Foundry. Google Vertex AI is listed as coming soon.

When to Use Sonnet 5 vs Sonnet 4.6 vs Opus 4.8

Use Case Best Model
Multi-step coding agents, brownfield refactors Sonnet 5
High-volume production pipelines, cost-sensitive Sonnet 5 (intro pricing)
Long-horizon computer use, browser automation Sonnet 5 at high effort
Hardest SWE tasks, exploit-level security work Opus 4.8
Light tasks, API prototypes, fast responses Sonnet 4.6 (still valid)
Frontier research, maximum accuracy budget Opus 4.8

Pricing: The Real Math

Sonnet 5 launches at $2 per million input tokens and $10 per million output tokens through August 31, 2026, then moves to standard pricing of $3 per million input tokens and $15 per million output tokens. Opus 4.8 is priced at $5 per million input tokens and $25 per million output tokens.

That sounds like a clean 2.5x discount. It is not quite that simple.

Sonnet 5 uses the tokenizer introduced with Opus 4.7. The same text maps to roughly 1.0 to 1.35 times more tokens than it did on Sonnet 4.6, depending on content type. Standard prompt caching applies at 0.1x input cost. A 50% Batch API discount is also available.

The introductory pricing is set to make the transition from Sonnet 4.6 roughly cost-neutral. Do not assume the sticker discount is your actual savings without running your own token count first.

Safety: Safer Than Sonnet 4.6, Behind Opus on the Hardest Tests

  laude Sonnet 4.6 vs Sonnet 5 vs Opus 4.8 vs Mythos Preview on the behavioral audit

More agentic power without tighter safety is a problem. Anthropic addressed this directly.

Pre-deployment evaluations found Sonnet 5 is safer than Sonnet 4.6. It refuses malicious requests more reliably, resists prompt-injection hijacks better, and shows lower rates of hallucination and sycophancy.

On the automated behavioral audit, which tests for a wide range of misaligned behaviors including cooperation with misuse and deception, Sonnet 5 scored lower than Sonnet 4.6. That means fewer undesirable behaviors overall. However, it shows somewhat higher rates than the more capable Opus 4.8 and Claude Mythos Preview.

On cybersecurity: Anthropic did not deliberately train Sonnet 5 on cybersecurity tasks. In a test building working exploits for Firefox 147 vulnerabilities, Sonnet 5 never produced a full working exploit (0%). It shows a slightly higher partial success rate than Sonnet 4.6 likely from general intelligence gains, not specific training.

Sonnet 5 launches with cyber safeguards enabled by default. These are the same safeguards present in Opus 4.7 and 4.8, detecting and blocking dangerous cyber usage in real time.

What Real Agentic Behavior Looks Like

Benchmark numbers tell one story. What engineers actually observed during early access tells another.

Testers described a model that finishes complex tasks where previous Sonnet models would stop short, that checks its own output without being asked, and that does all of this agentic work at an attractive price point.

One Rust engineer at Zed reported: the model found a bug, wrote a reproducing test, fixed it, stashed the fix, and confirmed the bug returned without it, all in a single pass, without a prompt asking it to do any of those steps. That is not instruction following. That is autonomous quality control.

Anthropic describes Sonnet 5 as offering superior instruction following, tool selection, and error correction for autonomous workflows, reliably handling complex, multi-step tasks that require sustained coherence and adaptive decision-making.

The pattern across early access partners is consistent. Sonnet 5 excels at finishing complex tasks where previous versions stopped short, checks its own output without being prompted, and does this at a price that makes the choice easy.

Conclusion

Sonnet 4.6 was a good model that ran out of road on the tasks that matter most in 2026: long-horizon agents, autonomous debugging, and multi-tool orchestration without a human in the loop.

Sonnet 5's pitch is confirmation that agentic capability is the new baseline expectation at every price tier. The differentiator is no longer who can do agentic work, but how cheaply and how reliably without human oversight.

The benchmarks are clear. The real-world testimonials from early access partners are consistent. Sonnet 5 sits clearly above Sonnet 4.6 across every tested category, and clearly below Opus 4.8 on everything except knowledge work, where it matches or nudges ahead.

If you were using Sonnet 4.6 because Opus felt like overkill, Sonnet 5 is the model you were waiting for. If you were using Opus because Sonnet kept stalling, test Sonnet 5 at high effort first before renewing that commitment.

The gap just got a lot smaller, and that changes the math for almost every production agent running today.

FAQs

Q1. What are the biggest improvements in Claude Sonnet 5 over Sonnet 4.6?

Claude Sonnet 5 introduces stronger agentic reasoning, improved multi-step task completion, selectable reasoning effort levels, better coding performance, and enhanced tool use, making it far more autonomous than Sonnet 4.6.

Q2. Should I choose Claude Sonnet 5 or Opus 4.8 for production AI agents?

Claude Sonnet 5 is ideal for most production AI agents because it delivers near-Opus performance at a significantly lower cost. Opus 4.8 remains the better choice for frontier research and the most demanding reasoning tasks.

Q3. Is Claude Sonnet 5 more cost-effective than Sonnet 4.6?

Yes. Despite using a newer tokenizer, Claude Sonnet 5 offers better performance per dollar, improved agent reliability, and introductory pricing that makes upgrading from Sonnet 4.6 highly attractive.