Skip to main content
Cybersecurity

Microsoft Warns AI Agents Can Leak Data via Poisoned Tool Descriptions

Laptop on a desk in a modern office with a blurred screen and subtle shadow.

"The trust boundary between them." Microsoft used that phrase to describe where a simple line of plain text can turn a helpful AI agent into a silent data exfiltrator.

Model Context Protocol (MCP): the fast-growing attack surface

Microsoft's Incident Response and Defender research teams lay the vulnerability squarely at the intersection of agents and the Model Context Protocol (MCP), an open protocol that lets an AI call outside tools the way an app calls an API. Agents such as Microsoft 365 Copilot, and custom agents built in Copilot Studio or Azure AI Foundry, can now send email, create files, change calendars and run multi-step jobs — and they reach business systems through MCP. Microsoft calls MCP the fastest-growing part of the agentic AI supply chain, which makes it an expanding attack surface.

A concrete poisoning scenario: the invoice example

Microsoft demonstrates the pattern with an invoice scenario intended to show the mechanics rather than name a victim. A finance team connects an agent to three tools, including a third‑party "invoice enrichment" service that was approved but never deeply reviewed. Every MCP tool ships with a short plain-text description that the agent reads to decide when and how to use it. An attacker updates that third-party tool: the visible name and summary remain unchanged, but the description hides a command — formatted as notes — ordering the tool to "grab the last thirty unpaid invoices and attach them to the next call."

MCP detects description changes on the fly. In setups without a re-approval trigger, the poisoned description goes live without extra review. Later, when an analyst asks an ordinary question about a supplier, the agent follows the hidden instruction, collects the invoices and sends them to the tool as part of a routine request. The tool returns a clean answer and quietly copies the stolen data to a server controlled by the attacker. Each step appears legitimate: the tool was approved, the query ran under the analyst's permissions, and the outbound call went to a previously allowed endpoint. Microsoft frames the problem as a gap in "the trust boundary between them."

Why descriptions matter: instructions mixed with data

Microsoft emphasizes that the fundamental weakness is that MCP mixes instructions and data in the same field. A tool's description sits in the agent's working memory right next to the agent's real orders, so editing that description can steer the agent as effectively as rewriting its system prompt. The agent has no reliable way to tell an honest instruction from a malicious one slipped in by whoever maintains the tool. Microsoft is careful to note this is not a bug in Copilot itself but a structural trust gap introduced by plugging in outside tools.

Microsoft's hard rules for defenders

Microsoft offers clear, operational guidance for defenders:

  • Treat every connected tool as part of your supply chain: keep a list of approved tool publishers, turn off "allow all," and permit agents to use only the specific tools they need.
  • Treat tool descriptions like system prompts: review any changes as you would a code change and scan text for commands that don't belong in a help field.
  • Put a human in front of risky actions: anything that moves money, shares data outside the company, or changes accounts should require human approval.
  • Give each agent its own identity and monitor it: log actions, set baselines for normal behavior, and flag new endpoints, unusually large data pulls, or odd queries.
  • Apply "least agency," not just least privilege: even low-permission agents can cause harm if allowed to act without human checks.

Microsoft maps these steps to its products — Prompt Shields, Purview DLP, Entra Agent ID, Defender for Cloud, and Sentinel — while stressing that the principles apply regardless of vendor stack.

Prior research and real-world incidents

Microsoft's warning builds on an identifiable trail of research and incidents. Invariant Labs coined "tool poisoning" in April 2025 with a proof of concept that hid instructions in a calculator tool's description and caused the Cursor editor to leak a user's SSH key; developer Simon Willison investigated that case. A related proof showed a malicious GitHub issue could hijack an agent connected to GitHub's MCP server to pull data from private repositories; OWASP cited that episode under "Agentic Supply Chain Vulnerabilities" in its December 2025 Top 10 for Agentic Applications. Koi Security found a real-world supply-chain failure in September 2025: an npm package named postmark-mcp mirrored a legitimate email tool for many clean releases before version 1.0.16 added a single line that secretly BCC'd every agent email to an attacker. Academics released the MCPTox benchmark in August 2025, testing poisoned tool descriptions against 45 real MCP servers and 20 leading AI models; it reported a success rate as high as 72.8 percent and that models almost never refused.

What this means for security teams, procurement leaders, and end users

Security teams should assume tools are part of the supply chain and bake description reviews, agent identities and logging into deployment processes. Procurement leaders will need to limit "allow all" approvals, require vendors to document description-change procedures, and demand re-approval triggers. End users and analysts should expect human approval gates for money transfers, external data sharing, and account changes and to see agents operate with distinct identities and visible logging.

Microsoft's takeaway is succinct and stark: AI that can act is only as trustworthy as the tools you let it touch, and right now those tools are easy to poison and hard to watch.

Original story