The lethal trifecta: governing the three capabilities you can't remove
Five days, four exploits, one pattern#
Between January 7 and January 15 2026, security researchers publicly disclosed indirect-prompt-injection vulnerabilities in four production AI tools used at Fortune 500 scale. The disclosures came one after another. The same firm wrote four of them. The same pattern broke each system.
| Date | Product | Impact |
|---|---|---|
| Jan 7 | IBM Bob (coding agent, closed beta) | Attackers could induce the agent to download and execute arbitrary malware by exploiting process substitution in shell-command sanitization |
| Jan 7 | Notion AI (publicly disclosed; reported Dec 24) | Salary data, candidate feedback, diversity hiring goals exfiltrated from a hiring tracker via poisoned resume PDF |
| Jan 12 | Superhuman AI (Grammarly-acquired) | Recent emails (financial, legal, medical) exfiltrated to Google Forms via CSP-whitelisted destination |
| Jan 13-15 | Claude Cowork (Anthropic) | Real estate loan estimates and partial Social Security numbers exfiltrated via malicious skill document and whitelisted API |
PromptArmor researched and disclosed all four. The pattern in every case was the same: the agent had access to private data the user trusted it with, the agent processed untrusted content that contained instructions and the agent could send data outbound. An attacker made the agent retrieve the data and exfiltrate it. The user often saw nothing.
Simon Willison named this configuration the lethal trifecta on June 16 2025. Seven months later, January confirmed the framing.
If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.
The three legs#
Willison’s three components, named precisely:
Access to private data. The agent can read your emails, your documents, your databases, your hiring tracker, your CRM. This is the access surface that makes the agent useful. An assistant that cannot read what you have access to cannot help you with what you are doing.
Exposure to untrusted content. The agent processes input that originates outside the trust boundary. A shared document that someone sent you. A resume from a candidate. An email in your inbox. A web page the agent fetched as part of research. The instructions embedded in that content reach the agent’s context and the agent is poorly equipped to distinguish “data the user wants summarized” from “instructions the attacker wants executed.”
The ability to externally communicate. The agent can make outbound HTTP requests, render images that fetch URLs, write to APIs, send messages. The exfiltration channel does not need to be obvious or attacker-owned. It can be a Google Form, a markdown image rendered by the host application, an authorized API call to a service the user trusts. Anything that carries data out of the trust boundary qualifies.
The arithmetic is direct. Combine all three and exfiltration is one well-crafted instruction away.

Why prevention does not work#
The first instinct of most governance programs is to remove a leg.
Strip the agent’s access to private data and it becomes a stateless chatbot. Useful for general questions. Useless for “summarize the candidates we are interviewing this week” or “draft a response based on the customer’s last six emails.”
Forbid exposure to untrusted content and the agent cannot read an email, an attached document, a shared file or a web page. It cannot do research. It cannot triage incoming requests. It cannot help with any inbound communication.
Block external communication and the agent cannot answer most questions, cannot send replies, cannot integrate with any system that requires an API call. It becomes a closed-loop note-taker.
Each option deletes a feature the agent was bought to provide. The trifecta is not a configuration error; it is a description of useful agents. A defender who insists on prevention is a defender who never deploys agents.
The realistic move is to assume all three legs are present and design the runtime so the attack class is contained when it fires.
The five-control containment architecture#
Five controls, applied together, contain the trifecta. None of them prevents the indirect prompt injection from being attempted. All of them limit what the attempt can accomplish.
LLM security focused on single model interactions. Agentic security addresses what happens when those models can plan, persist and delegate across tools and systems.
Per-user data scoping, not service-account-wide access. The agent reads what this specific user is allowed to read, not the union of everything the agent’s service account can reach. If the agent is operating on behalf of a junior employee, it does not have access to executive compensation data even if its service account technically does. Scope is enforced at the data layer, not at the prompt layer. A prompt injection cannot widen scope it does not have.
Per-task tool catalog, refreshed per session. The agent has access to the tools its current task requires and nothing else. A research agent does not have email-sending tools. A drafting agent does not have file-deletion tools. The catalog is the contract. Adding a tool is a deliberate, audited change. An attacker cannot trick the agent into using a tool that is not in the current catalog because the call simply will not resolve.
Allowlisted exfiltration channels. Outbound network from the agent runtime is restricted to verified destinations. Not the open internet. Not “any HTTPS endpoint.” A specific list of services the user has authorized for this task. Markdown image rendering and similar passive exfiltration vectors are stripped or sanitized at the rendering layer before the user’s browser fetches anything. The Notion AI incident hinged on the fact that Notion rendered the AI-generated edit, including a malicious image URL carrying exfiltrated data, before the user could approve. An allowlisted rendering layer breaks that path.
Runtime policy enforcement that intercepts before execution. Every tool call, every API request, every action proposed by the agent passes through a policy evaluation step before it executes. The policy can block based on the action, the input, the user, the data classification or the destination. The policy is not a prompt instruction the agent could be tricked into ignoring. It is an external enforcement layer the agent cannot bypass. This is what guardian agents provide and where policy-as-code for AI agents becomes operationally necessary.
Verifiable audit trails of every tool invocation. Every action is logged with the input that triggered it, the policy in force at the time, the outcome and the user context. When an incident occurs (and it will), the operator can reconstruct the full chain: which untrusted content was being processed, which instruction was executed, what data was touched, what destination was contacted. Without the audit trail, the post-mortem stops at “the agent did something it should not have.” With the audit trail, the post-mortem produces an answer.
These five controls are not a checklist of recommendations. They are the architecture that makes a trifecta-bearing agent safe enough to run.
How the January incidents would have failed differently#
Apply the architecture to each disclosed incident.
IBM Bob would still have hit the prompt injection. With per-task tool catalogs, the agent would not have had the shell-execution tool available outside an explicit code-review task scope. With allowlisted destinations, the malware download URL would not have resolved. With runtime policy enforcement, the unsanctioned process substitution would have been intercepted before execution. The injection happens. The exploit does not.
Notion AI would still have rendered the malicious instruction in the AI-generated edit. With allowlisted exfiltration, the markdown image URL pointing at an attacker-controlled endpoint would not have been fetched. With per-user data scoping, the agent would have had access only to the documents the requesting user was authorized to see, not the full hiring tracker. With audit logging, the attempt would have been visible to the security team within minutes, not days.
Superhuman AI exfiltrated to a Google Form. CSP whitelisting permitted Google as a destination because Google services are trusted. With per-task allowlisting (Google Forms is not a destination this email-summarization task requires), the exfiltration would have failed. With runtime policy evaluation, the outbound call patterns would have been flagged. With audit trails, the exfiltration of email content into a Form would have produced a clear signature.
Claude Cowork processed a malicious skill document and exfiltrated via a whitelisted API. With per-task scoping, the skill document’s instructions would not have had the authority to invoke arbitrary APIs. With per-user data scoping, the financial and partial-SSN data would have been outside the agent’s read scope without explicit user authorization for this task.
The injection happens in every scenario. The damage does not.
What governance teams should do this week#
Three actions, sequenced.
First, inventory the trifecta in your agent fleet. Every agent in production. For each, mark whether it has private-data access, untrusted-content exposure and external-communication capability. Most useful agents will mark all three. The point of the inventory is not to remove legs (you cannot) but to know which agents have which exposure and to size the containment effort accordingly. This is what a centralized agent registry is for.
Second, define containment scope per agent. For each trifecta-bearing agent, write down the three scopes: which users’ data, which tools (with the actual list, not “the registered toolset”), which destinations. If your agents currently run with service-account-wide access, the full tool registry and unrestricted egress, you are operating in the configuration that produced the January 2026 disclosures. Tighten the scopes. Document them. Make them the deployment contract.
Third, install runtime observation. Every tool invocation with input. Real-time policy evaluation with blocking authority. Daily review of violations. Weekly review of agent behavior against continuous certification baselines. If you are reading post-incident logs through grep on production servers, you are not operating runtime observation. If a security person cannot answer “show me every outbound API call this agent made yesterday” in under a minute, the audit infrastructure is missing.
Six months from now, the next round of disclosures will hit. The pattern will be the same. The agents whose operators implemented containment will see the injection attempts and shrug. The agents whose operators relied on prevention will be on the disclosure list.
How Roval implements containment#
Roval was built for this configuration. The platform maintains the agent inventory, classifies agents by risk tier (with the trifecta legs as inputs), enforces per-task tool catalogs, runs runtime policy evaluation with the authority to intercept before execution and produces the audit trail every post-mortem and every regulatory examination requires. The eight pillars of the Roval framework map to the five containment controls and add the certification, observability and incident-response infrastructure that makes them operate as a system rather than as a checklist.
For the operational neighbour to this article, see agent incident response playbook for what to do when the injection succeeds anyway and policy-as-code for AI agents for the runtime-enforcement layer described in control 4.
Analysis on AI agent governance, regulation and runtime risk. One email a week.