Anthropic Bolsters AI Model with Enhanced Reasoning, Security Features

"Our safety assessments found that Sonnet 5 shows an overall lower rate of undesirable behaviors than Sonnet 4.6, and is generally safer to use in agentic contexts," Anthropic asserted in an introductory blog post.

Anthropic frames Sonnet 5 as a safer, more agentic mid-sized model

Anthropic’s new mid-sized model, Sonnet 5, is presented as the company's most “agentic” release to date and, the company says, demonstrably safer than its immediate predecessor. According to Anthropic’s System Card and blog post, Sonnet 5 is less likely to hallucinate, less sycophantic, and better at refusing malicious requests and resisting prompt-injection attempts than Sonnet 4.6. The company also claims the model is more aware of — and can block — user misuse and deception, and that internal and third‑party benchmarks show an “overall lower rate of undesirable behaviors.”

Agentic performance and a new “effort” control for long-horizon tasks

Anthropic pitches Sonnet 5 at developers building agents to automate recurring, multi-step work. The company said the model shows clear gains over Sonnet 4.6 in coding, agentic search, multimodal reasoning and professional-task performance, and is better at sustaining attention across multi-part jobs — what Anthropic describes as “long horizon tasks.” The 5.0 release adds a new setting to scale the model’s effort: lower settings for simple tasks that use fewer tokens, and higher settings — labelled “xhigh” and “max” — for prolonged agentic workloads. Anthropic highlighted a Zapier engineer’s testimonial in which Sonnet 5 completed an end-to-end, two-part job (updating a contact database and sending a notice) that earlier Sonnets could not.

How Sonnet 5 stacks up against Opus and Mythos — and the pricing trade-offs

Anthropic’s benchmarks indicate Sonnet 5 closes the gap with the company’s flagship enterprise models but still trails Opus and Mythos on several measures. The company positions Sonnet 5 as a more cost‑effective option for many tasks: Opus is priced at $5 per million input tokens and $25 per million output tokens, while Sonnet users will pay $3 per million input tokens and $15 per million output tokens starting in September. Anthropic is running a promotional rate through the end of August at $2 per million inputs and $10 per million outputs. Sonnet is set as the default model for Claude Free and Pro users and is also available to Max, Team, and Enterprise customers — a combination Anthropic suggests will let users trim token budgets by routing work through Sonnet instead of Opus.

Cybersecurity guardrails amid a Washington export-control backdrop

Anthropic made a point of saying it “did not deliberately train Sonnet 5 on cybersecurity tasks.” The company also emphasized guardrails: when instructed to write a Firefox exploit, Sonnet 5 failed to complete the offensive task, although it progressed farther than Sonnet 4.6. Anthropic attributed that difference to “improvements in general intelligence rather than specific training.” The comment arrives in the shadow of a June export-control action: the US Commerce Department temporarily restricted foreign access to Anthropic’s newly released Mythos 5 and Fable 5 models, citing national security concerns. Anthropic’s public framing of Sonnet 5 reads like an attempt to expand capability without rekindling regulatory scrutiny.

How developers, enterprises, and policymakers are likely to respond

Developers and security teams: Developers building agents will test Sonnet 5’s long‑horizon abilities and use the new effort settings to manage token budgets; security teams will note the model’s refusal behavior and the failed exploit attempt but are likely to continue probing capabilities and guardrails in controlled environments.
Enterprises and procurement leaders: Organizations weighing cost and performance may shift routine, multi‑step workloads to Sonnet to save on token costs, especially while the promotional pricing runs through the end of August and before the September price change.
Policymakers and regulators: Given the Commerce Department’s June export-control restriction on Mythos 5 and Fable 5, regulators will likely watch whether Anthropic’s emphasis on guardrails and its public statements satisfactorily separate Sonnet’s capabilities from the higher-profile concerns attached to other models.

Anthropic has rolled Sonnet 5 into its product mix as a middle path: more capable and agentic than Sonnet 4.6, marketed as safer, and priced to sit between free offerings and the company’s Opus tier. The company’s explicit distancing from deliberate cybersecurity training — set against a recent Commerce Department restriction — frames Sonnet 5 as an experiment in balancing capability, cost, and regulatory caution. Whether that balance will satisfy customers seeking powerful agentic tools or regulators watching model misuse remains the question Anthropic has put on the table.

Original story