AI Agent Exposes 21 Zero-Days in Widely Used FFmpeg Library

An autonomous agent produced 21 confirmed zero-days in FFmpeg during a single run that the company says cost about $1,000.

depthfirst's agent and the FFmpeg sweep

Security startup depthfirst reported that an autonomous security agent scanned roughly 1.5 million lines of C in the FFmpeg codebase and produced 21 confirmed zero-days, each accompanied by a reproducible proof-of-concept input. The company published a proof-of-concept and said several of the bugs already carry CVE identifiers — listing nine as CVE-2026-39210 through CVE-2026-39218 — and that the remaining issues are fixed upstream but not yet numbered.

What the FFmpeg bugs look like

depthfirst described most of the findings as heap or stack overflows located in parsers and demuxers, spanning components from the TS demuxer to the VP9 decoder. Several of the vulnerabilities are long-lived: depthfirst notes multiple bugs had been latent for 15 to 20 years, and one stack overflow in service-description-table code dates to 2003 and sat untouched for 23 years. Depthfirst’s writeup included a published proof-of-concept demonstrating reproducible inputs that trigger the flaws.

Chrome 149, record patch volume, and the AI connection

Days after the FFmpeg disclosure, Google shipped Chrome 149 with fixes for 429 security bugs — the most ever in a single release. More than 100 of those patches were classified as critical or high severity, with many described as use-after-free or insufficient input validation issues. The highest-severity item called out in public notes is CVE-2026-10881 (CVSS 9.6), an out-of-bounds read and write in the ANGLE graphics engine that can let a crafted page escape the browser sandbox and run code on the host; Google paid $97,000 for that report.

Google’s release notes attribute the surge in reported vulnerabilities to an earlier overhaul of its bounty program, prompted by a flood of AI-generated submissions. The company’s change in April now asks for a concise reproducer in addition to longer writeups that AI agents tend to produce. Google has not directly tied the record number of fixes to AI authorship: of roughly 90 high-severity bugs, only 10 came from outside researchers, and 19 of the 22 critical bugs were found internally by Google.

Other autonomous finds and comparative evidence

depthfirst’s run is not an isolated case. Google’s Big Sleep agent previously reported an earlier run of FFmpeg bugs (visible on the FFmpeg security page tagged BIGSLEEP), and Anthropic’s Mythos model found a 16-year-old H.264 flaw plus other issues in FFmpeg for about $10,000 — three of those shipped in FFmpeg 8.1, according to Anthropic’s writeup. Separately, another autonomous tool recently found an authenticated remote code execution in Redis that had been present since version 7.2.0 and unnoticed for over two years.

Academic work bolsters the operational claim that agents can be efficient: a February study reproduced working proof-of-concepts for more than half of 100 real Linux kernel N-day bugs using an agent, and the study found the agent outperformed fuzzing on that sample.

Practical guidance for operators and users

depthfirst and the reporting recommend immediate patching: pull the fixed upstream FFmpeg build or apply your distribution’s security update as soon as it lands, and prioritize anything that ingests untrusted RTSP or AV1-over-RTP. The advisory emphasizes that FFmpeg is widely bundled — in media pipelines, Python wheels, container images, and appliances — and therefore embedded copies outside system packages must also be patched.

For Chrome, update to 149.0.7827.53 on Linux or 149.0.7827.53/54 on Windows and macOS, or confirm that auto-update has run.

Operational pressure: cheap discovery, costly response

The two episodes together underline a systemic strain: finding vulnerabilities has become markedly cheaper and faster with autonomous agents, but triaging reports, shipping fixes, and getting those fixes installed remains work-intensive. The reporting notes that much of that remediation burden continues to fall to volunteers and a relatively thin layer of human triagers who must keep pace with machine-generated output. The practical response suggested is shorter patch cycles, aggressive auto-update where available, and treating dependency bumps that include CVE fixes as security work rather than routine maintenance.

Original story