"Not a bug but a dangerous convention and a class of problems," the researchers at Adversa AI write — and with that phrase they describe a weakness that lets mature open-source coding agents hand a rewritten, destructive shell command to the system even after the agent's safety filter says it is safe to run.
GuardFall and the shell-rewrite gap
Adversa AI calls the bypass GuardFall. It exploits a mismatch between how most agents inspect commands and how bash actually executes them: the agents perform text-based blocklist checks on the literal command string, while the shell rewrites that string (removing quotes, expanding shortcuts and pipes, interpreting base64 or other encodings) before running it. A pattern that looks harmless to a plain-text matcher can become a destructive command once bash has finished its transformations. Adversa gives a concrete example: a filter watching for rm sees nothing wrong with r''m, but bash removes the empty quotes and runs rm anyway. The same shape works with base64 piped into a shell or common utilities like find and dd turned destructive by particular flags.
Ten vulnerable agents, one that held up: the scope of the survey
Adversa tested eleven popular open-source coding and computer-use agents. Ten left the gap open: opencode, Goose, Cline, Roo-Code, Aider, Plandex, Open Interpreter, OpenHands, SWE-agent, and the Hermes project — the latter where the issue first surfaced and is documented in Hermes's issue tracker. Only one, Continue, defended successfully in the default editor mode. Together the tools in Adversa's survey carried roughly 548,000 GitHub stars as of May 2026.
How the bypass plays out in real tests
Adversa describes the attack shape as simple and non-exotic: two conditions must align. First, the AI must produce the malicious command — not usually as a blunt "run rm -rf" (those are often blocked) but as a command tucked inside otherwise normal-looking content such as a build file, documentation-like reply, or repository-supplied config. Second, the agent must be running autonomously with an auto-execute flag enabled or with its container sandbox disabled — both common in automated pipelines. Adversa ran live tests using Claude Sonnet 4.6 and demonstrated a full end-to-end exploit against the production Plandex binary; the same attack shape worked against eight other agents in their tests. Adversa characterizes this work as lab research and notes that no public exploitation has been reported.
How Continue defends and what is left open
Continue resists GuardFall by parsing the command the way bash will before deciding whether to run it: it breaks the command into the same pieces the shell would, checks what actually runs, and enforces a hard list of destructive commands that are blocked outright. That approach held against every payload in Continue's default editor mode. Continue's command-line auto-run mode is weaker — a few payloads slipped through there, though the most destructive ones still hit the hard block. Adversa calls Continue's design portable and estimates re-implementing it is roughly a two-day job for an experienced engineer.
Operational mitigations for operators, maintainers, and end users
- Point $HOME at a throwaway folder so secrets like ~/.ssh and ~/.aws are out of reach.
- Turn off auto-execute flags such as --auto-exec, --auto-run, --auto-test, and --dangerously-skip-permissions unless the job cannot pause for a human.
- Do not let agents run on pull requests from forks; those PRs are a straightforward path from an attacker's file to live execution.
- Treat config files shipped inside a repository — for example, .aider.conf.yml — as untrusted code, because a malicious config can trigger the attack on first accepted edit.
None of these quick fixes is a complete answer, Adversa warns: adding more blocklist patterns does not solve the underlying mismatch between text-based checks and shell rewriting, and there is no single CVE that will patch away the class of problems.
What this means for technologists, procurement teams, and end users
- Technologists and security teams: expect to rethink guards that rely on plain-text pattern matching; Continue's approach suggests a concrete, short-term engineering path to more robust checks.
- Procurement and maintainers of automated pipelines: re-evaluate default settings that enable auto-run or remove sandboxing, and treat repository-supplied config and documentation as high-risk inputs.
- End users and developers running agents locally: follow the operational mitigations above and avoid running agents with full account privileges against untrusted repositories.
GuardFall is one more entry in a run of findings this year highlighting the same peril: untrusted text reaching a real shell before the guard understands what bash will actually run. Adversa notes related work earlier this year — its TrustFall research and other deny-rule bypasses — and names attacks like AutoJack and Agentjacking as part of the same thread. The record is now specific: ten widely used open-source agents expose a practical avenue to escalate text into destructive commands, Continue demonstrates a viable defensive pattern, and Adversa says implementing that pattern is an engineering task measured in days, not months. Whether maintainers adopt that fix will determine how quickly this "dangerous convention" becomes a solvable engineering standard rather than a standing vulnerability.




