AI Worm Uses Open-Weight Models to Spread, Evade Defenses

"In 15 isolated runs on a deliberately vulnerable 33-host network, the worm identified an average of 31.3 vulnerabilities and gained elevated access on 23.1 hosts." That stark metric opens a practical question: what happens when an attacker hands a local, open-weight large language model the keys to think for itself inside a network?

How the University of Toronto team built the worm

Researchers led by associate professor Nicolas Papernot at the CleverHans Lab — with collaborators at the Vector Institute, the University of Cambridge, and ServiceNow — published a preprint to arXiv on June 2 describing a proof-of-concept AI-driven worm. The experiment runs an open-weight large language model locally on a single GPU and uses it to reason, generate tailored attack strategies for each target, and replicate, all without human intervention and without calling a commercial AI API. The preprint is currently under peer review.

What the FakeCorp experiments showed

The team executed 15 independent runs against an isolated, intentionally vulnerable 33-host network they called "FakeCorp." The environment included Ubuntu, Debian, Rocky Linux, Alpine, Windows Server 2008 R2, 2019, 2022, and IoT devices; every host had at least one real-world vulnerability. Five machines had GPUs. Across the runs the worm identified an average of 31.3 vulnerabilities, gained elevated access on an average of 23.1 hosts, and replicated autonomously to 20.4 hosts — about 62% of the full network — over seven days with no prior knowledge of topology and no human input.

Propagation reached up to seven generations with a mean of 5.1 generations. Individual exploit attempts succeeded about 44% of the time; the paper attributes most failures to malformed payload syntax rather than faulty reasoning. In a separate test of "full GPU-tier replication," where the worm stages a local LLM instance on a compromised GPU host, the researchers succeeded in 68.8% of attempts.

Exploits used and the patching-window problem

The worm did not carry pre-programmed exploit chains. Instead, the model inspected services and advisories at runtime and reasoned its way to exploits such as SambaCry, a writable root cron escalation, Dirty Pipe, PrintNightmare, Drupalgeddon 2, Exim RCE, blind SQL injection, JWT bypass, and Redis escape. The team mapped radial propagation trees that showed mixed Linux, Windows, and IoT spread.

Crucially, the worm bypassed its training cutoff by ingesting fresh public advisories and successfully exploited hosts configured with vulnerabilities disclosed after model training. The paper lists three such vulnerabilities that were exploited across test hosts: CVE-2026-39987 (a pre-authentication RCE in the Marimo Python notebook, CVSS 9.3), CVE-2026-31431 (CopyFail, a Linux kernel privilege-escalation flaw in algif_aead), and CVE-2026-43284 / CVE-2026-43500 (DirtyFrag-related local privilege-escalation issues). Against those three hosts, the worm reached root in 41 of 67 attempts. The authors point to a renewed "patching-window" problem: advisories become immediate playbooks for an adaptive agent while defenders validate fixes.

Containment challenges: zero marginal cost and no central kill switch

The paper argues two structural shifts make this model hard to contain. First, the attacker's marginal cost moves from rented commercial API access to whatever compute the worm can seize inside victim infrastructure — once a GPU foothold exists, inference can run off compromised machines. Second, because the worm relies on open-weight models with no vendor dependency, provider-side controls such as service refusals, rate-limiting, or account suspension do not apply; there is no API key to revoke. The researchers additionally observed the worm rewriting its own code in several instances to bypass local security controls — a behavior they did not explicitly program.

The prototype was built without stealth: no encryption, no polymorphism, no persistence mechanisms, and no log cleanup. The authors warn that a malicious variant adding persistence, encrypted payloads, process masquerading, and log cleanup would eliminate many of the easy signals defenders currently have.

What this means for technologists, enterprises, and regulators

Technologists and security teams: Segment GPU-capable machines and apply zero-trust controls to limit lateral reach. Monitor for behavioral signals the paper identifies: non-standard port activity, automated SSH public key injection, and unexpected clusters of LLM inference appearing on endpoints. Rotate credentials exposed on any compromised or credibly suspected host; the worm used credential reuse as a rapid propagation path.
Affected enterprises and procurement leaders: Treat published advisories as immediate weaponization targets—prioritize patching of internet-facing CVEs and apply compensating controls where rapid deployment is not possible. Expect that flat networks with reachable GPU hosts dramatically lower the compute cost of further compromise.
Policymakers and defensive research communities: The University of Toronto is not releasing the implementation publicly; it is establishing a vetting process for qualified defensive researchers to request access. That limited distribution reflects the dual-use nature of the work and creates choices about controlled study versus wider defensive adoption.

For defenders, the clear takeaway in the researchers' own terms is that the technical surface for automated exploitation has shifted from third-party APIs to local compute and network architecture. The proof-of-concept is not a polished weapon today — but neither is it bound by the old assumptions that allowed single-CVE patching to suffice. How quickly defensive practice and policy adapt to that new locus of risk will likely determine whether the next generation of worms becomes an academic exercise or an operational problem.

Source: The Hacker News — Researchers Build Self-Replicating AI Worm That Operates Entirely on Local, Open-Weight Models