Anthropic reported that it and approximately 50 partners used Claude Mythos Preview to find more than 10,000 high- or critical-severity vulnerabilities in systemically important software in a single month.
Anthropic's May 2026 findings and the new volume reality
The May 2026 update from Anthropic, cited in the source article, illustrates a dramatic shift: models pointed at mainstream software now produce mass discovery at machine scale. When Claude Mythos Preview targeted Firefox it produced 181 working exploits, compared with just 2 from the previous frontier model. The model surfaced flaws across every major operating system and browser, including an OpenBSD bug that had gone undetected for 27 years. At the time of writing, more than 99% of the vulnerabilities Mythos found remained unpatched.
Zero Day Clock, Verizon's DBIR, and the AWS intelligence picture
The weaponization timeline for discovered vulnerabilities has collapsed. Zero Day Clock reported a 2026 average time-to-exploit of roughly 24 hours, down from approximately 53 days in 2024. Verizon's 2026 Data Breach Investigations Report ties 32% of initial-access techniques to exploitation of vulnerabilities and explicitly expects that share to climb as AI coding assistants lower the barrier for exploit building and tool porting.
Concrete adversary behavior reinforces the trend. An AWS threat-intelligence report from February 2026 documented operations that required no zero-days: weak credentials industrialized through a custom MCP server running offensive tools autonomously. AWS confirmed more than 600 affected devices across 55+ countries; independent researchers reported actor logs that had queued 2,516 devices across 106 countries. Together, these data points show discovery and exploitation operating at machine speed and global scale.
Why ordering faster patching is insufficient
Regulators, boards, and executives are urging — and in some cases codifying — faster remediation. Yet the operational reality is stark. Verizon's 2026 DBIR, which tracked 13,000+ organizations, shows the median fix time for known-exploited vulnerabilities rose to 43 days (from 32 days the year before) and the share of vulnerabilities fully patched dropped from 38% to 26%. Even the best-performing organizations close only about 30–40% of known-exploited vulnerabilities in the first week after detection.
Patching involves regression testing, change windows, approvals, and uptime or compliance constraints. When adversaries operate on an hours-long clock and remediation workflows run in weeks, asking teams to "patch faster" is, as the source puts it, like ordering a freighter to brake on a dime.
Breach and Attack Simulation (BAS): testing controls rather than guessing
Breach and Attack Simulation (BAS) addresses a fundamental question that severity scores do not answer: "What is actually exploitable against us right now, and would our defenses catch it?" Unlike scans or theoretical mappings, BAS runs real-world adversary techniques against a live prevention and detection stack. According to the source, BAS does three things at scale:
- Separates theoretical risk from practical exposure by showing which flaws a WAF, IPS, or EDR already neutralizes and which would permit intrusion.
- Validates the many controls organizations pay for — BAS measures whether those tools fire as configured and reveals gaps that policy overlap can hide.
- Buys safe time to patch by proving when an asset is covered by controls and when mitigations must precede a standard change process.
Field reports cited in the article indicate CISOs are increasingly reserving dedicated budget lines for BAS. Gartner's label for a related practice — "Adversarial Exposure Validation" — merges control effectiveness with business context to prioritize what matters to the organization, not just what scores highest on an abstract scale. Paired with autonomous penetration testing, BAS can show both whether an attacker can reach critical assets and whether the organization would detect or block that activity.
Picus' agentic BAS approach: coordination, not code creation
The source highlights an important safety constraint: raw generative models asked to invent exploits can return live malware or hallucinated techniques. Picus CTO Volkan Erturk warned such models might hand back a live malware sample or invent attacks a group never uses. Picus' solution is to use the model for coordination rather than payload creation.
Picus' agentic BAS matches incoming threat reports against a curated, pre-vetted library of safe test building blocks. A multi-agent system orchestrates the work: one agent identifies the threat and lays out a research plan, others gather and validate intelligence from multiple sources, and a builder agent maps adversary TTPs into attack chains ready for safe simulation. The claimed result is an accurate, ready-to-run simulation assembled in minutes that produces posture scores, prioritized mitigations, and executive reports with humans reviewing exceptions rather than driving every step. The Picus Platform, the source states, continuously tests whether defenses block and detect what matters, points to vendor-specific mitigations, and re-validates closures.
What this means for CISOs, regulators, and security teams
CISOs: expect dedicated BAS budget lines and tools that run autonomously at machine speed to be prioritized, because traditional triage and patch-first workflows no longer scale against AI-driven discovery and weaponization.
Regulators and boards: many regulations now point toward same-day fixes for some critical vulnerabilities, and executives are demanding faster answers — but the data in Verizon's DBIR shows remediation timelines and patching rates moving the wrong way.
Security teams: the practical path forward described in the source is to validate controls continuously, use BAS and autonomous pentesting to evidence what is truly exploitable, and rely on curated, safe test libraries combined with human oversight so validation keeps pace with autonomous offense.
When discovery-to-exploit collapses from months to hours, the defensible response shifts from racing to patch every finding toward proving, at machine speed, what defenses already stop and where urgent mitigation must be applied. For teams charged with protecting production, that is the gap Picus' platform says it was built to close.




