Troy West's dinner in Warsaw was interrupted by his phone. He welcomed it. The message: a trial version of XBOW's offensive security platform had found a vulnerability that cascaded into a full takedown of a development environment used by Moderna. For security teams, that sort of outage is the opposite of a curiosity; for the team testing the tool, it was proof that an emerging class of AI-powered tools can outpace traditional testing and, in hours, do what human testers often cannot.
How XBOW's agentic testing exposed real-world failures at Moderna
XBOW's platform, exercised under trial conditions by Moderna, produced two discrete findings that illustrate the class of problems defenders now face. First, the product identified a web application firewall bypass on an application built with the Spring Boot framework that relied on encoding a single character (a capital “A”) as its percent-encoded URL equivalent (A), which the WAF interpreted as a legitimate request and allowed access. Second, and more consequentially, XBOW found a valid API key embedded in the source code of an internal application called Orders, used by Moderna’s research partners to procure drug substances. With no supplied login credentials the platform authenticated, probed APIs for SQL injection, and triggered a malformed-input path that dumped garbage data into a shared routing application other services depended on — an effect that led to an outage across the ecosystem.
Human pen-testers later validated those findings and acknowledged they would not have found them in the same way. Moderna’s deputy CISO, Farzan Karimi, framed the value plainly: “If we’re able to demonstrate where you could have an outage in a safe testing environment, that’s a great signal.” He also warned that the sheer volume of AI-discovered bugs presents a second-order problem: “How do we now handle the volume of bugs that have gone up due to AI-driven scale?”
Claude Mythos: a turning point in scale and context
Industry leaders point to Anthropic’s Claude Mythos as an inflection. Jay Chaudhry, CEO of Zscaler, said his team used Mythos to probe the company’s own applications and confirmed the tool found “some serious stuff.” He cautioned that the difference was not necessarily in severity but in volume: “There aren’t enough resources and cycles to fix all those.”
Cisco’s Tom Gillis tied the change to context window and reasoning capacity. “The models couldn’t understand the entirety of it before,” he said. “Now they can. That’s why they’re finding all these vulnerabilities.” Legacy network devices and long-lived codebases, sometimes running for decades without meaningful patching, create a larger surface that newly capable models can now reason across — producing a deluge of findings defenders must triage.
Cisco’s Live Protect and the appeal of compensated controls
One operational response is a class of mitigations Cisco calls Live Protect: a compensated control built on eBPF that operates at the Linux kernel level to block threats without rewriting production binaries. Gillis described it as “a pinpoint, laser-fine control that can shield a vulnerability on a production system.” The idea is to tighten the interval between discovery and formal patching — “a finger in the dike that plugs a hole until you get to new change control windows,” he said — and to do so without taking systems offline.
Cisco began shipping the product in October, and customer urgency sharply increased after Mythos’s arrival. “Customers are like, ‘Oh, good story, Tom. I’ll think about it.’ Now it’s like, ‘Oh my God, turn this thing on right now,’” Gillis said. He also acknowledged eBPF’s open-source nature and expected competitors to follow similar paths: “While I’m very proud of Cisco leading the market with these compensated controls, I know my competitors have to do this.”
What this means for technologists, procurement leaders, and end users
- Technologists and security teams: Expect higher volumes of validated exploit proofs that demand continuous testing and tighter prioritization. Karimi argued that “If you have exploit proofs, you can provide that plus-one modifier and really point your developers to remediate the top tier of real risk that’s been validated.”
- Procurement leaders and enterprise IT: The market is reacting. Products such as XBOW’s agentic offensive testing and Cisco’s Live Protect are being positioned as necessary stopgaps to shrink remediation windows; procurement decisions will increasingly weigh speed-to-mitigate as heavily as traditional patch cycles.
- End users and dependent services: Long-running infrastructure and unpatched devices mean outages and compromises can ripple beyond a single app; as the Moderna example shows, an exploitable bug in one service can cascade through shared routing and integration points.
The broader reckoning for defenders and the warnings from model developers
Model developers are not silent about the risks. Anthropic itself signaled that timelines for publicly available, cybersecurity-focused models are shortening and cautioned: “In that world, cyberattacks could occur much more often, and in much more unpredictable forms.” Gillis was blunt about the stakes for organizations that fail to adapt: “Some people will be slow to change,” he said. “But the consequence of not making that change is gonna be front-page news. It’s a massive, massive compromise. You know, like, ‘you gave up every credit card number.’ Bummer.”
The episode at Moderna, and the reactions from vendors and model builders, suggest the defensive problem is now twofold: AI raises the pace and breadth of discovery while operational models for fixing software remain largely unchanged. Vendors are shipping controls designed to bridge that gap; defenders, meanwhile, must decide which of those temporary plugs they will trust — and for how long.




