"AI is already accelerating vulnerability discovery, reducing the effort needed to identify, validate, and weaponize flaws," Ryan Dewhurst, watchTowr's Head of Threat Intelligence, told The Hacker News.
Google Threat Intelligence Group: indicators of LLM-generated exploit
Google's Threat Intelligence Group (GTIG) disclosed on Monday that it identified a previously unknown threat actor using a zero-day exploit that GTIG assessed with high confidence had been developed with an artificial intelligence model. GTIG said the exploit was implemented in a Python script that contained multiple hallmarks commonly associated with large language model (LLM) generated code — notably abundant educational docstrings, a hallucinated CVSS score, and a structured, textbook Pythonic format including a clean _C ANSI color class and detailed help menus.
The zero-day 2FA bypass: what was exploited and how
GTIG described the vulnerability as a two-factor authentication (2FA) bypass in "a popular open-source, web-based system administration tool." The company did not name the tool. According to GTIG, exploitation required valid user credentials and stemmed from a "high-level semantic logic flaw" that arose from a hard-coded trust assumption — a kind of issue that, the report says, LLMs are particularly good at spotting. Google said it worked with the impacted vendor to responsibly disclose the flaw and have it fixed in order to "proactively disrupt the activity."
PromptSpy: an AI-enabled Android backdoor with autonomous features
GTIG's disclosure also describes a separate Android backdoor called PromptSpy that abuses an LLM to analyze the current screen and provide instructions to pin a malicious app in the recent apps list. Google said PromptSpy includes an autonomous agent module that can navigate the Android user interface and interpret real-time user activity to choose next actions without human intervention.
- PromptSpy is capable of capturing victim biometric data to replay authentication gestures such as a lock screen PIN or pattern, enabling the malware to regain access to a compromised device.
- To prevent removal, the malware uses an "AppProtectionDetector" module that identifies on-screen coordinates of the "Uninstall" button and serves an invisible overlay over the button to block touch events and make the button appear unresponsive.
- Google said PromptSpy initializes using hardcoded default infrastructure and credentials but is designed for operational resilience: its command-and-control (C2) infrastructure — including Gemini API keys and a VNC relay server — can be updated dynamically via the C2 channel so operators can rotate components at runtime without redeploying the payload.
- Google reported no apps containing PromptSpy were found on the Play Store and said it disabled all assets related to the malicious activity.
Broader patterns: Gemini-related abuse, agentic tools, and shadow APIs
Google's report places the zero-day and PromptSpy cases in a broader pattern of AI-enabled misuse. The company listed multiple examples of model-assisted activity: a suspected China-nexus group UNC2814 prompting Gemini to act as a network security expert to support vulnerability research; the North Korean actor APT45 feeding "thousands of repetitive prompts" to recursively analyze CVEs; APT27 using Gemini to accelerate development of a fleet management app; and Russia-nexus activity delivering AI-enabled malware named CANFAIL and LONGSTREAM that use LLM-generated decoy code.
GTIG also described adversaries priming models with historical vulnerability datasets, including experiments with a specialized GitHub repository named "wooyun-legacy" that acts as a Claude code skill plugin containing over 5,000 vulnerability cases harvested from the WooYun disclosure platform. Priming in this way "facilitates in-context learning to steer the model to approach code analysis like a seasoned expert," Google said.
Separately, Google highlighted a thriving grey market for API relay platforms and "shadow" APIs. A March 2026 study by academics at the CISPA Helmholtz Center for Information Security identified 17 shadow APIs claiming to provide indirect access to official model services; their evaluation found that model performance can drop substantially on some benchmarks (for example, Gemini-2.5-flash fell from 83.82% accuracy to approximately 37.00% on MedQA when routed through shadow APIs). GTIG warned that proxy services can capture every prompt and response, creating opportunities to exfiltrate sensitive data and to fine-tune models illicitly.
What this means for technologists, enterprises, and end users
- Technologists and security teams: Expect faster discovery and weaponization timelines. GTIG's analysis found LLM-style artifacts in exploit scripts and showed how AI can identify semantic logic flaws (hard-coded trust assumptions) that traditionally required manual analysis.
- Enterprises and open-source maintainers: The zero-day required valid credentials but targeted a widely used administration tool; vendors and maintainers should prioritize patching hard-coded trust paths and coordinate disclosures as Google described it did with the impacted vendor.
- End users and device owners: PromptSpy demonstrates how AI-enabled malware can automate screen analysis, replay biometric gestures, and obstruct uninstallation — behaviors that increase operational resilience and complicate remediation even when malicious apps are not present on official app stores.
Google's findings sketch a picture in which adversaries combine LLMs, agentic tooling, and proxy marketplaces to compress timelines for discovery, weaponization, and large-scale misuse. The immediate questions are concrete: which defensive practices will scale to handle LLM-augmented exploit discovery, and how will maintainers of widely used tools respond to logic-level trust assumptions that models are now surfacing at scale?




