Enterprise Data Uploads to AI Models Surge 93% in a Year

“A total of 18,033 TB of data was transferred to AI and machine learning applications during the last year,” the Zscaler 2026 AI Threat Report states — roughly equivalent, the company notes, to 3.6 billion digital photos. Published June 17, the report finds a 93% year‑over‑year jump in enterprise data moved into AI tools, a pattern Zscaler warns increases the risk of data breaches and cyber espionage.

Scale of enterprise uploads: 18,033 TB and 989.3 billion transactions

Zscaler based its findings on analysis of 989.3 billion AI and ML transactions flowing through its cloud from January 2025 through December 2025. During that period it recorded 18,033 TB of data transferred to AI and machine learning applications — a volume the report equates to 3.6 billion digital photos. The company characterizes the trend as a 93% year‑over‑year increase in employees transferring enterprise data to AI tools.

Grammarly and ChatGPT drive more than half of transfers

The report identifies two services as the primary drivers of these transfers: Grammarly, responsible for 38% of such data movements, and ChatGPT at 21%. Zscaler lists other tools observed in its telemetry, including OpenAI, Codium, GitHub Co‑Pilot, Perplexity, Microsoft Co‑Pilot, Google Gemini and Claude. Combined, the Grammarly and ChatGPT figures represent over half of the enterprise data transfers the report measured.

ChatGPT violations: 410 million-plus DLP incidents

Zscaler identified more than 410 million Data Loss Prevention (DLP) policy violations tied to ChatGPT, an increase of 99% year‑over‑year. The company says these violations involved sensitive categories such as financial records, personally identifiable information (PII), source code, healthcare data and other regulated content. The report emphasizes that employees are frequently not acting with malice but are instead trying to use AI to work more efficiently — a behavior the report describes as creating potentially significant data privacy implications.

Codium and source‑code leakage: 242 million DLP violations

The AI coding assistant Codium also surfaced as a major vector for leakage in Zscaler’s telemetry. The report records over 242 million DLP violations linked to Codium, a 100% year‑over‑year increase. Zscaler highlights the elevated risk this poses to source code and proprietary logic, noting such leakage “could be highly damaging to businesses.”

Zscaler’s recommended controls: inventory, defaults, zero trust, inline inspection

To counter the risks it documents, Zscaler offers four practical controls:

Inventory all GenAI apps and apps with embedded AI functionality: maintain a continuously updated catalog of standalone GenAI tools and every SaaS or internal app that includes AI features.
Disable risky AI defaults: turn off auto‑enabled AI functionality in SaaS and productivity apps until such features are reviewed and configured to match the organization’s risk posture.
Apply zero trust to all model interactions: implement least‑privilege access for every user, service and system that interacts with an AI model.
Enforce AI guardrails with inline inspection: ensure inline inspection across all AI/ML traffic to prevent external malicious activity from compromising AI systems and to stop sensitive data from being exposed via prompts or outputs.

“The riskiest AI applications tend to be those that employees use without thinking—writing assistants, coding helpers, or AI features layered into collaboration suites. Their convenience is exactly what makes them higher risk; they see the same sensitive content employees do, often at the moment it’s created,” the report warned.

What this means for technologists, procurement leaders, and end users

Technologists and security teams: will need to incorporate AI/ML traffic into existing DLP and zero‑trust architectures, and to prioritize inline inspection and a continuously updated inventory of GenAI and AI‑enabled services.
Procurement and IT leaders: will be pressured to review SaaS defaults and contractual visibility into vendor AI features before approving tools — particularly those with auto‑enabled AI functionality.
End users and employees: should be made aware that convenience‑driven use of writing assistants and coding helpers can create large-scale exposure of financial, PII, healthcare and source‑code data, even when the intent is to increase productivity.

Zscaler’s telemetry paints a clear picture: the convenience of embedded and standalone AI tools is driving mass transfers of potentially sensitive enterprise data. The company’s recommendations — inventory, default controls, least‑privilege access and inline inspection — are presented as immediate steps organizations can take to reduce exposure. Whether enterprises will adopt those steps at scale, and how vendors will respond to calls to change default behaviors, are questions the report’s numbers leave sharply in focus.

Original story: https://www.infosecurity-magazine.com/news/sensitive-ai-data-upload-doubles/