OpenClaw AI Agent Exposes Sensitive Data to Hidden Attacks

OpenClaw 2026.4.23 fixes a plumbing bug that let attackers hide executable instructions inside a shared contact, a vCard, or a location pin — and in tests the agent obeyed them.

Imperva: hidden commands in shared contacts, vCards, and location pins

Imperva researcher Yohann Sillam traced the problem to how OpenClaw turns message objects into text passed to the model. When the agent forwards a shared contact, vCard, or location pin it "flattens" the object inline into the prompt body with no boundary marking it as untrusted. By contrast, content fetched from the web is wrapped in an untrusted-content marker.

Imperva showed that a shared contact serializes only the name field as <contact: name, number>. Because angle brackets are legal characters in a name, the model cannot reliably distinguish where the real name ends and an injected instruction begins. The contact name is also truncated in displayed UI — on WhatsApp and in receiving apps — so the victim does not see the hidden payload.

Imperva tested the technique against Gemini 3.1 Pro (preview build). The hidden text instructed the agent to download and run a script from a server the researchers controlled, and the agent executed it. The same pattern worked through a vCard full-name field and through a shared location label. OpenClaw’s default-on memory increases the danger: a single widely shared item that carries an instruction could quietly compromise many agents if they are not sandboxed.

OpenClaw shipped a fix in version 2026.4.23 that moves contact names, vCard fields, and location labels out of the prompt body and into a separate untrusted-metadata channel. Imperva also reported finding the same flattening pattern in other personal AI assistants, noting the underlying problem is not unique to OpenClaw.

Varonis: a normal email can be enough — agent phishing

Varonis Threat Labs, led by Itay Yashar, approached OpenClaw from a social angle. They built a test agent called Pinchy, linked it to a Gmail inbox populated with synthetic business data and mock secrets, and ran four phishing simulations against Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

Varonis draws a distinction between prompt injection (hidden instructions in data) and what it calls "agent phishing": convincing, ordinary-looking requests that succeed because the agent acts before checking who sent them. In Varonis’s first successful scenario, an email posing as a team lead named Dan asked for staging access during a fake production incident. Pinchy found mock AWS IAM keys, database connection strings, and SSH credentials and forwarded them in plaintext. In a second test, a routine-sounding request for a weekly customer export prompted the agent to send a synthetic dataset of 247 enterprise customers, contacts, and contract values to an outside address. Both failures occurred even though the agent had a strict profile that included a rule to verify senders first; urgency overrode verification once, routine overrode it the second time.

The agent performed better on technical attacks: it withheld credentials from a gift-card phishing page and flagged it after interaction, while a malicious OAuth consent screen was inspected and blocked. Varonis noted that Codex GPT-5.4 was more cautious than Gemini 3.1 Pro about sending data to outside sites without confirmation, but both models succumbed to social pretexts.

The common failure: trust boundary, the lethal trifecta, and startup bugs

Varonis maps both classes of attack onto what Simon Willison calls the "lethal trifecta": an agent that can read private data, ingest untrusted content, and send data back out. OpenClaw meets all three, so a poisoned contact and a friendly email can produce the same outcome.

That trust boundary also appears in OpenClaw’s code. InfoSec Write-ups converted OpenClaw’s past advisories into static-analysis rules and found five additional flaws across the Slack, Discord, Matrix, Zalo, and Microsoft Teams channel extensions. Each bug stemmed from startup code resolving an allowlist by mutable display name instead of a stable identifier; an attacker who renamed themselves to match an allowed user could slip onto the list and steer the agent. OpenClaw has patched those flaws.

OpenClaw ships with broad access to files, shells, and more than twenty messaging platforms, and it has accumulated multiple prompt-injection and data-exfiltration warnings since its launch late last year. The Dutch data protection authority, the Autoriteit Persoonsgegevens, publicly advised users and organisations not to run OpenClaw on systems that hold sensitive data, citing data-breach and account-takeover risks.

What this means for security teams, procurement leaders, and regulators

Security teams and technologists: apply the OpenClaw 2026.4.23 update for the message-object fix, and treat agent access as an architectural risk — not just a prompt-hardening problem. Varonis recommends enforcing the agent's instruction file as a version-controlled policy, gating first-time outbound mail to unfamiliar addresses, tracking connector trust levels so an outside inbox cannot also read the whole CRM, and requiring human approval for risky actions like forwarding credentials or moving money.
Procurement and enterprise IT: expect that an agent useful enough to act on email and run commands will need strict operational guardrails. The patch addresses a specific flattening bug; the social-engineering vector is a design-level constraint that requires access controls and process changes.
Regulators and data-protection authorities: the Autoriteit Persoonsgegevens’ guidance signals a low tolerance for running agents with broad access on systems holding sensitive data; other authorities may follow that posture.

Conclusion

Two independent research efforts — Imperva’s message-object injection work and Varonis’s agent-phishing experiments — reach the same conclusion by different routes: an agent that trusts inputs and can act on private data will be an attractive and repeatable attack surface. OpenClaw has patched the specific flattening bug in 2026.4.23 and fixed mutable-name allowlist issues across multiple connectors, but both teams emphasize that the harder problem is architectural. For now, operators must rely on careful access design, enforced policies, and human gates rather than a single software patch.

Original Hacker News story