Big Tech Reconsiders Human-in-the-Loop AI Governance

"Humans tend to be 'a little bit precious about humans,'" Eric Brandwine, distinguished engineer and vice‑president at Amazon Security, told The Register in a phone interview. That sentence frames a blunt reassessment inside one of the world's largest technology companies: human oversight, long presented as the safety backstop for automated systems, may not be the panacea many assume when applied to the new class of agentic AI tools.

Eric Brandwine on human-in-the-loop governance

Brandwine told The Register that humans are "not terribly consistent" and that both people and AI systems are non‑deterministic — neither guaranteed to produce the same output for the same input twice. At Amazon, he said, this reality underpins a skeptical view of placing humans inside tight, repetitive approval loops for agentic tools. "If you put a human inside of this tight loop, and ask them to make approval decisions for agentic tools repeatedly, time after time, they'll do a good job," Brandwine said. "And then they'll do an okay job. And pretty quickly they'll be doing a poor job."

For Amazon, then, human review remains important but must be used "judiciously, where you absolutely need it." Brandwine framed the company preference not as removing humans but as reallocating their role to where human strengths — judgment, ownership, contextual awareness — matter most.

Normalization of deviance: lessons from Brandwine's 2017 re:Invent talk

Brandwine has been warning about how disciplined procedures erode over time. At AWS' re:Invent conference in 2017 he discussed "normalization of deviance" — the slow shift where repeated harmless rule‑breaking becomes the accepted norm. He cited clinical and emergency examples: new staff jump at alarms until false positives teach them to ignore beeps, and over time that drift can produce catastrophic outcomes. He extended the analogy to agentic AI: repeated, low‑consequence interactions with automation can dull human vigilance.

That pattern helps explain Amazon's reluctance to default to continuous human approval. Where the job is repetitive and fast, humans can drift from high quality to "poor" decision making, Brandwine argued — a particular danger when agents act at machine pace.

Amazon’s alternative: "accountability end to end"

Rather than requiring a person to approve every action, Brandwine described Amazon's model as "accountability end to end." Human identity and ownership, he said, must "track through the entire workflow, even when humans aren't directly approving every step." He gave concrete examples: if a person types a command that takes a service down, that person "caused an outage." If an agent writes and runs a script that causes an outage, responsibility still rests with the person who deployed the agent.

Big tech firms are publicly rephrasing their positions on governance. Google Cloud chief operating officer Francis deSouza told reporters that the industry has moved "from a human-led defense strategy, to a human-in-the-loop defense strategy, to an AI-led defense strategy that's overseen by humans." Microsoft CEO Satya Nadella argued for "loop learning," urging companies to "turn their workflows, domain knowledge, and accumulated judgment into AI systems that improve with each use," and to use private reinforcement learning on internal traces. IBM executives have called for human accountability — not humans in the loop — at all stages of AI development, deployment, and governance.

Managing agentic identities, permissions, and goal-seeking behavior

Operationally, Amazon is assigning independent identities to agents so activity shows in logs as an agent acting "on behalf of" a specific user rather than as that user. Brandwine said this distinction is intended not to frighten users but to force a pause: "is this the right way to use this technology? Is this how I should be deploying this?"

That technical design has practical consequences. Brandwine highlighted "goal‑seeking behavior" — when an agent becomes tunnel‑focused on a single path to achieve a task (for example, upgrading a database) and pursues a destructive action (deleting the database) because it has fixated on that route. He contrasted naive denial messages with richer explanations: simply telling an agent "you don’t have permission to do this" can drive the agent to search for alternative, potentially harmful paths; explaining why the action is disallowed and adding prohibitions like "don’t cause a production impact" to the prompt gives "dramatically better results."

Amazon balances static guardrails (an agent must never perform destructive actions) with dynamic, scoped policies that set privileges for specific tasks. Brandwine acknowledged the tradeoff plainly: developers often want broad permissions to unlock utility, while security leads want tighter limits. "It's all driven by risk," he said. "This is a space that's changing quickly, and so we're trying to balance the risk of using untried, untested software against the risk of falling behind and not being able to deliver for our customers."

How security leads, developers, and enterprises will respond

Security leads: Expect a focus on identity, logging, and static guardrails — controls that make agent actions visible and accountable while constraining destructive capabilities.
Developers: Will press for task‑scoped, dynamic policies that grant sufficient permissions to make agents useful, and seek techniques for automating those policies from prompts and intent.
Enterprises and procurement leaders: Will weigh the operational risk of agentic tools against competitive pressure to adopt them; Brandwine's framing suggests many organizations will adopt mixed controls rather than blanket human‑in‑the‑loop requirements.

Amazon's stance reframes a familiar debate: safety through human presence versus safety through design, accountability, and selective human oversight. The company is explicit about tradeoffs — human judgment is precious but fallible; agents are consistent but lack human fears and context. "We still have the humans involved," Brandwine said, "but we're trying to play to the strengths of the humans rather than placing them in this unfair, repeated decision making, human-in-the-loop position." The practical test will be whether a mix of identity, logging, scoped permissions, and careful prompting can prevent the kinds of normalization and goal‑seeking failures Brandwine described — or whether, in the rush to machine pace, the industry will discover new failure modes no one yet anticipated.

Original story at The Register