Anthropic Unveils Dual AI Models, Fable 5 and Mythos 5, With Enhanced Cyber Safeguards

"Anthropic calls Mythos 5 the strongest cybersecurity model in the world," the company said — and on June 9 it split that strength into two products so it could put most of it behind a safety wall.

The split: Claude Fable 5 and Claude Mythos 5

Anthropic released Claude Fable 5 to the public on June 9 and simultaneously kept the same underlying model locked as Claude Mythos 5 for a vetted group of cyber defenders and critical infrastructure operators. The practical difference is a layer of classifiers: when Fable 5 detects certain risky requests it routes the response to Claude Opus 4.8; Mythos 5 keeps the full cyber capabilities available to approved users. Anthropic prices both models at $10 per million input tokens and $50 per million output tokens — less than half the price of the earlier Mythos Preview — and made Fable 5 available through the Claude API immediately. It is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost through June 22, after which it moves to usage credits.

How the cyber classifiers operate

The classifiers watch for a set of categories — cyber, biology, chemistry, and distillation — and intervene when a request trips them. Distillation, as Anthropic defines it, means extracting a model's capabilities to train a competing model; Anthropic blocks distillation to prevent near-frontier capabilities leaking without safeguards. When Fable 5 flags a request, it does not refuse outright: the output is handed to Opus 4.8 and the user is told the handoff occurred. Anthropic designed the cybersecurity classifier to block not just exploit development but offensive cyber tasks generally, including reconnaissance, discovery, lateral movement and the agentic steps that comprise real attacks.

Red-team testing, jailbreaks, and robustness

Anthropic reports specific test results. In an internal evaluation that set Fable 5 to block rather than fall back, and which did not try to evade the safeguards, the classifiers prevented the model from making any progress on those offensive tasks. One external partner found Fable 5 complied with zero harmful single-turn requests on cyberattack planning, exploit development, or defense evasion, sustaining that performance against 30 public jailbreak techniques. A public bug bounty ran for over 1,000 hours and produced no universal jailbreak, prompt, or harness that strips the safeguards wholesale; external red teams likewise found none for long-form agentic tasks, with one exception: the UK's AI Security Institute made progress toward a universal jailbreak within a brief initial testing window. Anthropic concedes that fully preventing universal jailbreaks is likely impossible and says its goal is to make any remaining jailbreaks slow and costly enough to detect before they are used at scale.

Why Anthropic treated Mythos-class capabilities as a special risk

The decision to split the product traces to results from Project Glasswing and Mythos Preview. In April Anthropic's red team reported Mythos Preview identified and exploited zero-day vulnerabilities in every major operating system and every major web browser when instructed. The oldest bug it found dated to a 27-year-old flaw in OpenBSD. In one notable case the model autonomously wrote a remote code execution exploit against FreeBSD's NFS server from a 17-year-old bug that Anthropic triaged as CVE-2026-4747; Anthropic describes the result as full root for an unauthenticated attacker from anywhere on the internet, while the NVD entry frames kernel code execution as reachable by an attacker able to send packets to the NFS server while the kgssapi.ko module is loaded. Anthropic says these capabilities were not explicitly trained for but emerged as side effects of broad improvements in code, reasoning, and autonomy — the same gains that also make the model better at producing patches.

Cloudflare, Mozilla, maintainers: the defender's new tempo

Project Glasswing's practical outputs underline the defensive imperative. Anthropic and roughly 50 partners used Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities in systemically important software in the first weeks. Cloudflare found 2,000 bugs, 400 of them high- or critical-severity. Mozilla found and fixed 271 in Firefox 150 — more than ten times what it caught in Firefox 148 using Opus 4.6. Anthropic says open-source maintainers, already overwhelmed by low-quality AI-generated reports, have asked the company to slow disclosures because they cannot write patches fast enough. In Glasswing, Anthropic reports a high- or critical-severity bug found by the model takes about two weeks to patch on average, shifting the bottleneck from discovery to remediation. Anthropic's N-day experiments showed Mythos Preview could build working Linux privilege-escalation exploits from a disclosed CVE and its patch in under a day at modest compute cost, sharpening the need for rapid patch deployment.

Data retention, access programs, and the wider question

Anthropic is also changing data handling: it will require 30-day retention for all traffic on Fable 5, Mythos 5, and future models at this capability level, across first- and third-party surfaces. The company says it will not use the data for training or any non-safety purpose, will log all human access, and will delete it after 30 days except where a safety investigation or legal obligation requires holding it longer. Anthropic has opened a Cyber Verification Program that lets vetted security professionals use its models for legitimate offensive work without the cyber safeguards, plans to widen Mythos 5 access through a trusted-access program, and aims — once compute capacity permits — to fold Fable 5 back into subscription plans without the usage-credit premium that kicks in after June 22.

Anthropic's launch buys a defensive head start, but the company itself frames the larger question: similarly capable models from other labs are coming, and not all of them will ship with a wall of classifiers in front. That head start will matter only if others follow the same trade-offs between capability, access, and safety.

Source: The Hacker News — Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards