Anthropic Unveils Guarded AI Model Amid Hacking Risks

"Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage," the company said in a draft blog sent to CyberScoop.

Anthropic’s choice: Mythos capability, Fable 5 constraints

Anthropic announced it will make an altered version of its powerful internal model, Claude Mythos, available to the public as Claude Fable 5. The company said Fable 5 is the "same underlying model" as Mythos but that for a narrow set of topics — notably cybersecurity and biology — the model will route responses to Claude Opus 4.8, a previous public model. Anthropic framed the move as balancing two pressures: the dangers posed by Mythos-level capabilities and demand from organizations that have been "clamoring for access."

Guardrails, testing, and the limits Anthropic disclosed

Anthropic said it subjected Fable 5 to both internal and external red team testing aimed at common model vulnerabilities such as jailbreaking. The company reported that those tests "identified no known 'universal' jailbreaking techniques," while not specifying whether any partial jailbreaking techniques were found. Anthropic acknowledged the risk that motivated adversaries will try to "circumvent our safety measures," noting that the uplift in capability from Mythos-level models could be valuable to those who could financially gain from cyberattacks.

Anthropic also warned that the safety routing may blunt legitimate uses. "Because we have prioritized safety, we’ve deliberately tuned the safeguards to be cautious, and they are still stricter than would be ideal—for example, sometimes benign requests will trigger our classifiers," the company wrote, and said it aims to reduce false positives as it updates safeguards after launch.

Benchmarks: Opus 4.8 versus Mythos Preview — measurable differences

Anthropic published comparative testing results that draw a clear line between Opus 4.8 and Mythos Preview. On a 16‑point proficiency scale for writing end‑to‑end exploits and building exploit primitives, Opus 4.8 averaged 5 out of 16, while Mythos Preview scored closer to 10. In tests where Opus 4.8 was given only high‑level descriptions of previously discovered vulnerabilities in open‑source software projects, the model — without safeguards — reproduced nearly 80% of those vulnerabilities; Anthropic said unspecified safeguards reduce that success rate to 1%.

Another test cited Opus' ability to develop Firefox browser exploits: without guardrails, Opus could identify a full working exploit 8.8% of the time and a partial working exploit 68.8% of the time. Anthropic described Opus 4.8 as a "slight improvement" on earlier 4.7 series models but "generally much less capable than Mythos Preview."

Data retention, the White House executive order, and access timing

Anthropic said it will retain all user traffic to Fable and Mythos for 30 days on both its own platforms and third‑party services. The company stated the retained data "won’t be used to train new Claude models or for 'any non‑safety‑related‑purpose.'" A spokesperson later told CyberScoop the company's data retention policies "are specific to their safeguards work and is unrelated to the EO." The article notes a White House executive order that creates a voluntary framework allowing AI companies to share frontier models with government up to 30 days before public release.

What this means for Project Glasswing, federal agencies, and cyber defenders

Project Glasswing: Members of Project Glasswing — a consortium of public and private businesses previously given access to a preview version of Mythos — will be able to upgrade to the full Claude Mythos 5 to continue their work, according to Anthropic.
Federal agencies: Anthropic said access to Mythos 5 will be expanded over time "through a more systematic trusted‑access program" that will include federal agencies, signaling a staged rollout beyond the current preview group.
Cyber defenders and enterprises: Anthropic characterized Fable 5's safety routing as likely to blunt both malicious and benign requests in narrow domains; it also promised follow‑up benchmarks and assets to help users understand capabilities and constraints.

Anthropic said it will release follow‑up benchmarks and assets for Fable 5 and that it expects to refine safeguards post‑launch. The company also acknowledged a historical pattern that cybersecurity researchers have consistently found ways to jailbreak older AI models, and it warned adversaries will have incentives to attempt circumvention. The public release therefore tests two linked claims: that routing sensitive topics to a less capable model meaningfully reduces misuse, and that the red team work has found no universal jailbreak that would defeat those safeguards. Anthropic has committed to iterative updates; whether those updates will close the gap between safety and legitimate utility — and whether those safeguards will hold up as the broader public probes Fable 5 — remains the immediate question the company's follow‑up benchmarks will need to answer.

Source: CyberScoop — Anthropic’s new model is Mythos on a leash