AI-Powered Bug Hunting Spurs Surge in Patches

"At first, yes, this means more patches and thus more work for admins," Dustin Childs told The Register.

Palo Alto Networks: frontier models found 75 issues across 130 products

Palo Alto Networks (PAN) said it scanned its entire codebase using frontier language models, including Anthropic’s Mythos, and found 75 security holes that were grouped into 26 CVEs. Product manager Lee Klarich wrote that PAN began testing Mythos on April 7 and continued using Mythos alongside Claude Opus 4.7 and OpenAI’s GPT-5.5-Cyber. The company reported the LLMs scanned over 130 Palo Alto Networks products and platforms, and that this was the first time the majority of findings were the result of frontier AI models scanning PAN’s code.

PAN said none of the discovered bugs were under active exploitation and that, as of the company’s Wednesday advisory, it had fixed all bugs in its SaaS-delivered products and coded patches for all customer-operated products. Klarich also wrote the company intends “to fix every vulnerability we find before advanced AI capabilities become widely available to adversaries,” and that PAN expects “a narrow three-to-five-month window for organizations to outpace the adversary before AI-driven exploits start to become the new norm.”

Microsoft’s MDASH: more than 100 specialized agents and a record Patch Tuesday

Microsoft said its new multi-model agentic scanning harness, codenamed MDASH, helped researchers find vulnerabilities disclosed on May’s Patch Tuesday. Microsoft initially said MDASH found 17 vulnerabilities across its products on a Patch Tuesday that included 30 critical CVEs. In a blog, Microsoft VP of agentic security Taesoo Kim described MDASH as orchestrating “more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end.”

Microsoft’s Patch Tuesday disclosures included what the company characterized as 16 new vulnerabilities in the Windows networking and authentication stack, among them four critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service. Tom Gallagher, VP of engineering at Microsoft Security Response Center, acknowledged that the month’s release “sits on the larger side of a hotpatch month,” and said he expects AI-assisted bug hunting to increase Patch Tuesday releases as both Microsoft and third-party researchers use these tools.

Mozilla: a sudden surge in Firefox fixes, and Mythos’ contribution

Mozilla reported a sharp increase in fixes for Firefox in April, issuing 423 bug fixes compared with 76 in March and an average of 21.5 fixes per month last year. The browser maker previously said Anthropic’s Mythos found 271 flaws in Firefox 150. Mozilla’s April total — more than five times March’s count and almost 20 times last year’s monthly average — underscores the scale of discovery when frontier models are applied to large, widely used codebases.

Operational reality: triage, patching, and the bottleneck Katie Moussouris warned about

Security practitioners in the source material stress that discovery is only the first step. Katie Moussouris, CEO of Luta, told The Register that finding bugs is the cheap part and that triage, disclosure, building patches that do not break production, and getting customers to deploy them are the expensive, labor-intensive stages. She pointed to Palo Alto Networks’ jump in CVEs and warned that multiplied across vendors the bottleneck becomes admins and vulnerability management teams.

Dustin Childs of Zero Day Initiative echoed that initial increase in findings will translate into more patches and more administrative work, and warned of a lasting problem if patches fail or break systems. “Many customers don’t trust patches as it is, so if AI-related patches break things, they are less likely to apply as time goes on,” he told The Register. Moussouris framed use of multiple models as necessary: “no single model catches everything,” she said, explaining why PAN ran multiple frontier models and why Microsoft orchestrates scores of specialized agents.

What this means for technologists, admins, and procurement leaders

Technologists and security teams: Expect increased discovery rates when frontier models are applied. PAN and Microsoft both reported that multiple models or agent ensembles find different classes of bugs, and PAN said the majority of its May findings came from frontier AI scans.
Admins and vulnerability management teams: Prepare for a heavier workload. Multiple sources in the story warn that triage, testing, and safe deployment of patches will be the limiting steps — and that loss of trust in patches that break systems could reduce patch adoption over time.
Procurement and vendor risk teams: Note the narrow timeline vendors are citing. PAN expressed a three-to-five-month window to outpace adversaries before advanced AI capabilities proliferate; vendors and customers will face pressure to scan codebases now and to fund the downstream work of remediation and safe rollout.

Plaid across these reports is a single, sharp fact: frontier AI models are surfacing far more flaws than prior workflows did, and organizations are now confronting the harder questions of fixing, testing, and deploying those fixes without breaking production. Vendors such as Palo Alto Networks and Microsoft are racing to find and patch proactively — and they are explicit about a short window to do so before attackers can leverage similar AI capabilities. The practical puzzle left hanging by the evidence presented here is whether triage, patch engineering, and deployment practices can scale quickly enough to preserve trust in patches during that three-to-five-month sprint.

Original story