"Automated scanners are brilliant at finding known, signature-based vulnerabilities. But they fail miserably at AI security."
Cobalt’s 2026 State of Pentesting report: a blunt verdict
Offensive security vendor Cobalt released findings in its 2026 State of Pentesting report that signal growing disillusionment with fully automated penetration testing. The company said 78 percent of respondents to its survey experienced "critical false negatives" from automated scanning tools. The survey size was 450 respondents, which Cobalt acknowledged was small, but the company characterized the results as clear: practitioners are encountering missed critical issues when they rely on autonomous scanners.
Why automated tools are missing AI-era flaws
Cobalt framed the shortfall specifically around AI-driven application logic. The firm said automated scanners excel at "known, signature-based vulnerabilities" but "fail miserably at AI security." The report singled out prompt injection and "excessive agency flaws" as categories that require "creative, multi-turn interaction chains" and "adversarial psychology"—patterns Cobalt described as invisible to tools that operate using single-shot automated queries.
Vulnerability severity rises in AI and LLM environments
Cobalt reported a marked difference in severity between traditional and AI environments. About 12 percent of vulnerabilities detected in traditional environments were classified as high or critical severity; in AI and large language model (LLM) environments that figure rose to 32 percent. Cobalt said that 32 percent number has held for the past two years in its pentesting data, suggesting AI-related deployments continue to introduce a higher share of serious flaws.
Responses from vendors and practitioners
The Cobalt data corresponded with a sharp drop in openness to fully automated scanning: just 9 percent of respondents said they were open to a purely automated security scanning approach in 2026, down from 29 percent the previous year. Cobalt described the decline as a "healthy sign," arguing practitioners are "seeing through the vendor hype and demanding actual assurance rather than just coverage." The company also advocated a hybrid model—letting most systems be automatically scanned by AI while reserving human-led testing for the most critical systems; Cobalt sells such a solution.
Other firms have reported similar pressures. Application security firm Veracode told Cobalt's report readers that AI-assisted software development is producing more vulnerabilities than security teams can remediate. Veracode reported that roughly 82 percent of companies are leaving known vulnerabilities unresolved for more than a year, and that high-risk vulnerabilities are increasing as a share of discovered issues.
CJ Moses and the human-in-the-loop caveat
Not all industry voices reject automation wholesale. Amazon security chief CJ Moses told Cobalt's reporters at the RSA Conference in April that AI pentesting tools have made Amazon security teams "40 percent more efficient," though Cobalt noted Moses did not specify the metric behind that figure. Moses nonetheless emphasized limits: "AI pentesting still needs a human in the loop to ensure it doesn't muck something up," he said. He added that AI is effective "when you have large amounts of data and need that big view," but from a decision-making standpoint "it isn't something that we're ready to rely on."
What this means for security teams, procurement leaders, and vendors
- Security teams and technologists: Expect to shift resources back toward human-led testing for critical systems, especially AI/LLM deployments where Cobalt reports a 32 percent high-or-critical rate. Teams will likely triage automation for coverage and humans for logic- and interaction-based flaws.
- Procurement leaders and enterprise owners: The drop from 29 percent to 9 percent in willingness to accept fully automated scanning suggests buyers will demand proof of human oversight and assurances about AI-specific test cases in procurement contracts.
- Automated-pentesting vendors: The Cobalt survey and Veracode's remediation statistics signal pressure to improve detection of multi-turn, adversarial, and prompt-injection style flaws—or to position tools explicitly as coverage enhancers rather than replacements for human testers.
The record Cobalt presents is straightforward: automated scanners still find many known flaws, but in AI-powered environments the stakes are higher and the gaps are more consequential. Organizations weighing efficiency gains—such as the efficiency claim CJ Moses described—must also account for the 78 percent rate of reported critical false negatives and the persistently higher share of severe findings in AI and LLM contexts. The practical result Cobalt recommends is hybrid testing; whether the market adopts that model broadly, or vendors close the detection gap for AI-specific logic flaws, remains the next question.




