What should a governance team do today?

Three actions inside two weeks. First, inventory every production agent and mark the trifecta legs for each: does it access private data, does it process untrusted content, does it communicate externally? Almost every agent will mark all three. Second, define the containment scope per agent: per-user data scoping (not service-account-wide access), per-task tool catalog (not the full registered toolset), allowlisted destinations (not the open internet). Third, install runtime observation: every tool invocation logged with the input that triggered it, real-time policy evaluation that can block before execution and a daily review of policy violations. The governance question is not whether the trifecta is present (it always is) but whether the containment architecture is operating.

The lethal trifecta: governing the three capabilities you can't remove

Q: What is the lethal trifecta?

The lethal trifecta is a framing Simon Willison published in June 2025 for the indirect prompt injection threat against AI agents. The three capabilities are: access to private data (the agent can read emails, documents, databases), exposure to untrusted content (the agent processes input from external sources such as emails, shared documents, web pages), and the ability to externally communicate (the agent can make outbound HTTP calls, render images, send messages). When all three are present, an attacker can embed instructions in untrusted content that cause the agent to retrieve private data and send it to an attacker-controlled endpoint. The agent is doing its job correctly, just on behalf of the wrong principal.

Q: What happened in January 2026?

Between January 7 and January 15 2026, the security firm PromptArmor publicly disclosed four critical indirect-prompt-injection vulnerabilities in production AI tools used at Fortune 500 scale. IBM Bob (a coding agent) could be made to download and execute malware via process substitution in shell commands. Notion AI exfiltrated salary data, candidate feedback and diversity goals from a hiring tracker via a poisoned resume PDF. Superhuman AI exfiltrated emails containing financial, legal and medical content via Google Forms. Claude Cowork exfiltrated real estate loan estimates and partial Social Security numbers via a malicious skill document. Every incident hit all three legs of the trifecta. Notion AI's case is particularly brutal: the agent rendered the AI-generated edit, including the malicious image URL carrying exfiltrated data, before the user could approve or reject it.

Q: How does containment differ from prevention?

Prevention says: do not give the agent access to private data, or do not let it read untrusted content or block external communication. Each option breaks a useful agent. Containment says: assume the trifecta is present and design the runtime to limit blast radius. Five controls do the work: tight scope on private-data access (the agent reads only what this user is allowed to read, not what the agent's service account can read), narrow tool catalogs (the agent has access only to the tools its current task requires, refreshed per session), allowlisted exfiltration channels (outbound network is restricted to verified destinations, not the open internet), runtime policy enforcement that intercepts violations before execution and audit trails that capture every tool invocation with the input that triggered it.

Five days, four exploits, one pattern#

Between January 7 and January 15 2026, security researchers publicly disclosed indirect-prompt-injection vulnerabilities in four production AI tools used at Fortune 500 scale. The disclosures came one after another. The same firm wrote four of them. The same pattern broke each system.

Date	Product	Impact
Jan 7	IBM Bob (coding agent, closed beta)	Attackers could induce the agent to download and execute arbitrary malware by exploiting process substitution in shell-command sanitization
Jan 7	Notion AI (publicly disclosed; reported Dec 24)	Salary data, candidate feedback, diversity hiring goals exfiltrated from a hiring tracker via poisoned resume PDF
Jan 12	Superhuman AI (Grammarly-acquired)	Recent emails (financial, legal, medical) exfiltrated to Google Forms via CSP-whitelisted destination
Jan 13-15	Claude Cowork (Anthropic)	Real estate loan estimates and partial Social Security numbers exfiltrated via malicious skill document and whitelisted API

PromptArmor researched and disclosed all four. The pattern in every case was the same: the agent had access to private data the user trusted it with, the agent processed untrusted content that contained instructions and the agent could send data outbound. An attacker made the agent retrieve the data and exfiltrate it. The user often saw nothing.

Simon Willison named this configuration the lethal trifecta on June 16 2025. Seven months later, January confirmed the framing.

If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.

What is the lethal trifecta: the three legs#

Willison’s three components, named precisely:

Access to private data. The agent can read your emails, your documents, your databases, your hiring tracker, your CRM. This is the access surface that makes the agent useful. An assistant that cannot read what you have access to cannot help you with what you are doing.

Exposure to untrusted content. The agent processes input that originates outside the trust boundary. A shared document that someone sent you. A resume from a candidate. An email in your inbox. A web page the agent fetched as part of research. The instructions embedded in that content reach the agent’s context and the agent is poorly equipped to distinguish “data the user wants summarized” from “instructions the attacker wants executed.”

The ability to externally communicate. The agent can make outbound HTTP requests, render images that fetch URLs, write to APIs, send messages. The exfiltration channel does not need to be obvious or attacker-owned. It can be a Google Form, a markdown image rendered by the host application, an authorized API call to a service the user trusts. Anything that carries data out of the trust boundary qualifies.

The arithmetic is direct. Combine all three and exfiltration is one well-crafted instruction away.

Simon Willison's June 2025 blog post titled 'The lethal trifecta for AI agents: private data, untrusted content and external communication' showing the three-circle diagram of the trifecta — Simon Willison's original framing on June 16 2025. The three-circle diagram has become the canonical visual for the indirect prompt injection threat against agents. simonwillison.net

Why prevention does not work#

The first instinct of most governance programs is to remove a leg.

Strip the agent’s access to private data and it becomes a stateless chatbot. Useful for general questions. Useless for “summarize the candidates we are interviewing this week” or “draft a response based on the customer’s last six emails.”

Forbid exposure to untrusted content and the agent cannot read an email, an attached document, a shared file or a web page. It cannot do research. It cannot triage incoming requests. It cannot help with any inbound communication.

Block external communication and the agent cannot answer most questions, cannot send replies, cannot integrate with any system that requires an API call. It becomes a closed-loop note-taker.

Each option deletes a feature the agent was bought to provide. The trifecta is not a configuration error; it is a description of useful agents. A defender who insists on prevention is a defender who never deploys agents.

The realistic move is to assume all three legs are present and design the runtime so the attack class is contained when it fires.

The five-control containment architecture#

Five controls, applied together, contain the trifecta. None of them prevents the indirect prompt injection from being attempted. All of them limit what the attempt can accomplish.

LLM security focused on single model interactions. Agentic security addresses what happens when those models can plan, persist and delegate across tools and systems.

Per-user data scoping, not service-account-wide access. The agent reads what this specific user is allowed to read, not the union of everything the agent’s service account can reach. If the agent is operating on behalf of a junior employee, it does not have access to executive compensation data even if its service account technically does. Scope is enforced at the data layer, not at the prompt layer. A prompt injection cannot widen scope it does not have.

Per-task tool catalog, refreshed per session. The agent has access to the tools its current task requires and nothing else. A research agent does not have email-sending tools. A drafting agent does not have file-deletion tools. The catalog is the contract. Adding a tool is a deliberate, audited change. An attacker cannot trick the agent into using a tool that is not in the current catalog because the call simply will not resolve.

Allowlisted exfiltration channels. Outbound network from the agent runtime is restricted to verified destinations. Not the open internet. Not “any HTTPS endpoint.” A specific list of services the user has authorized for this task. Markdown image rendering and similar passive exfiltration vectors are stripped or sanitized at the rendering layer before the user’s browser fetches anything. The Notion AI incident hinged on the fact that Notion rendered the AI-generated edit, including a malicious image URL carrying exfiltrated data, before the user could approve. An allowlisted rendering layer breaks that path.

Runtime policy enforcement that intercepts before execution. Every tool call, every API request, every action proposed by the agent passes through a policy evaluation step before it executes. The policy can block based on the action, the input, the user, the data classification or the destination. The policy is not a prompt instruction the agent could be tricked into ignoring. It is an external enforcement layer the agent cannot bypass. This is what guardian agents provide and where policy-as-code for AI agents becomes operationally necessary.

Verifiable audit trails of every tool invocation. Every action is logged with the input that triggered it, the policy in force at the time, the outcome and the user context. When an incident occurs (and it will), the operator can reconstruct the full chain: which untrusted content was being processed, which instruction was executed, what data was touched, what destination was contacted. Without the audit trail, the post-mortem stops at “the agent did something it should not have.” With the audit trail, the post-mortem produces an answer.

These five controls are not a checklist of recommendations. They are the architecture that makes a trifecta-bearing agent safe enough to run.

How the January incidents would have failed differently#

Apply the architecture to each disclosed incident.

IBM Bob would still have hit the prompt injection. With per-task tool catalogs, the agent would not have had the shell-execution tool available outside an explicit code-review task scope. With allowlisted destinations, the malware download URL would not have resolved. With runtime policy enforcement, the unsanctioned process substitution would have been intercepted before execution. The injection happens. The exploit does not.

Notion AI would still have rendered the malicious instruction in the AI-generated edit. With allowlisted exfiltration, the markdown image URL pointing at an attacker-controlled endpoint would not have been fetched. With per-user data scoping, the agent would have had access only to the documents the requesting user was authorized to see, not the full hiring tracker. With audit logging, the attempt would have been visible to the security team within minutes, not days.

Superhuman AI exfiltrated to a Google Form. CSP whitelisting permitted Google as a destination because Google services are trusted. With per-task allowlisting (Google Forms is not a destination this email-summarization task requires), the exfiltration would have failed. With runtime policy evaluation, the outbound call patterns would have been flagged. With audit trails, the exfiltration of email content into a Form would have produced a clear signature.

Claude Cowork processed a malicious skill document and exfiltrated via a whitelisted API. With per-task scoping, the skill document’s instructions would not have had the authority to invoke arbitrary APIs. With per-user data scoping, the financial and partial-SSN data would have been outside the agent’s read scope without explicit user authorization for this task.

The injection happens in every scenario. The damage does not.

Lethal trifecta vs. direct prompt injection#

The trifecta is not a synonym for prompt injection. It is the architectural shape that makes one specific class of prompt injection (indirect, content-borne) cause data exfiltration.

Direct prompt injection happens at the prompt boundary. A user types a malicious instruction into a chat box and the model obeys. The attacker is the user. The blast radius is whatever that user could already do.

A jailbreak. A policy violation. A bad output. Annoying. Not usually a breach.

Indirect prompt injection happens through content the agent fetches. The attacker is not the user. The attacker is whoever wrote the resume PDF the agent is summarizing, the email the agent is reading, the shared doc the agent is paraphrasing.

The user trusts the content. The agent processes it. The instruction inside it executes with the user’s authority. This is where the trifecta lives.

The three legs explain why indirect prompt injection translates to a breach. Private data access supplies the payload. Untrusted content supplies the instruction. External communication supplies the exit. None of those three on its own is fatal. All three together is the configuration that produced the January 2026 disclosures.

A related but distinct attack class is memory poisoning, where the malicious instruction is written into the agent’s persistent memory store and fires on a later user query. The trifecta still applies (private data, untrusted content, external communication) but the temporal decoupling makes real-time detection harder. See memory poisoning and OWASP ASI06 for the longer-form treatment.

What governance teams should do this week#

Three actions, sequenced.

First, inventory the trifecta in your agent fleet. Every agent in production. For each, mark whether it has private-data access, untrusted-content exposure and external-communication capability. Most useful agents will mark all three. The point of the inventory is not to remove legs (you cannot) but to know which agents have which exposure and to size the containment effort accordingly. This is what a centralized agent registry is for.

Second, define containment scope per agent. For each trifecta-bearing agent, write down the three scopes: which users’ data, which tools (with the actual list, not “the registered toolset”), which destinations. If your agents currently run with service-account-wide access, the full tool registry and unrestricted egress, you are operating in the configuration that produced the January 2026 disclosures. Tighten the scopes. Document them. Make them the deployment contract.

Third, install runtime observation. Every tool invocation with input. Real-time policy evaluation with blocking authority. Daily review of violations. Weekly review of agent behavior against continuous certification baselines. If you are reading post-incident logs through grep on production servers, you are not operating runtime observation. If a security person cannot answer “show me every outbound API call this agent made yesterday” in under a minute, the audit infrastructure is missing.

Six months from now, the next round of disclosures will hit. The pattern will be the same. The agents whose operators implemented containment will see the injection attempts and shrug. The agents whose operators relied on prevention will be on the disclosure list.

Jon Krohn's October 2025 deep-dive on the lethal trifecta and what containment looks like in practice. YouTube

How Roval implements containment#

Roval was built for this configuration. The platform maintains the agent inventory, classifies agents by risk tier (with the trifecta legs as inputs), enforces per-task tool catalogs, runs runtime policy evaluation with the authority to intercept before execution and produces the audit trail every post-mortem and every regulatory examination requires. The eight pillars of the Roval framework map to the five containment controls and add the certification, observability and incident-response infrastructure that makes them operate as a system rather than as a checklist.

For the operational neighbour to this article, see agent incident response playbook for what to do when the injection succeeds anyway and policy-as-code for AI agents for the runtime-enforcement layer described in control 4.

Frequently asked questions about the lethal trifecta#

What is the lethal trifecta?#

Simon Willison’s framing for the indirect prompt injection threat against AI agents. Three capabilities (access to private data, exposure to untrusted content, the ability to communicate externally) combine to make data exfiltration trivial. An attacker embeds instructions in untrusted content that cause the agent to retrieve private data and send it to an attacker-controlled endpoint. The agent is doing its job correctly, just on behalf of the wrong principal.

Why is the lethal trifecta hard to fix?#

Every useful agent has all three legs. Removing private data access reduces the agent to a stateless chatbot. Removing untrusted content exposure means the agent can never read an email, a shared doc or a web page. Removing external communication means the agent cannot answer most questions about the world.

The trifecta is not a bug. It is the definition of an agent that does work. Defenders cannot prevent the attack class by deleting capabilities. They have to contain it.

What happened in January 2026?#

Between January 7 and January 15, the security firm PromptArmor publicly disclosed four critical indirect-prompt-injection vulnerabilities in production AI tools used at Fortune 500 scale: IBM Bob (malware execution via shell-command substitution), Notion AI (salary and hiring data exfiltrated via poisoned resume PDF), Superhuman AI (email content exfiltrated via Google Forms) and Claude Cowork (real estate loan estimates and partial SSNs exfiltrated via a malicious skill document). Every incident hit all three legs of the trifecta.

How does containment differ from prevention?#

Prevention says do not give the agent access to private data or do not let it read untrusted content or block external communication. Each option breaks a useful agent. Containment assumes the trifecta is present and limits the blast radius through five runtime controls: per-user data scoping, per-task tool catalogs, allowlisted exfiltration channels, runtime policy enforcement that intercepts violations before execution and verifiable audit trails of every tool invocation.

How does the trifecta map to OWASP and EU AI Act obligations?#

OWASP’s 2026 Top 10 for Agentic Applications lists indirect prompt injection (ASI01) as the leading risk category. The lethal trifecta is the architectural pattern that makes ASI01 exploitable. EU AI Act Article 14 (human oversight) and Article 12 (logging) implicitly require the trifecta to be containable: the system must produce evidence sufficient for a human to identify when the agent acted on instructions from untrusted content, and the operator must be able to halt and audit.

Who coined the term lethal trifecta?#

Simon Willison, the British software engineer and creator of Datasette, named the configuration on June 16 2025 in a blog post titled “The lethal trifecta for AI agents: private data, untrusted content and external communication.” Security teams adopted the framing quickly because it gave a precise three-circle name to a pattern researchers had been describing in long paragraphs. It is now the standard reference for indirect prompt injection exposure in OWASP, HiddenLayer and PromptArmor write-ups.

How does the lethal trifecta differ from direct prompt injection?#

Direct prompt injection happens when the user is the attacker. The blast radius is whatever the attacking user could already do: a jailbreak, a policy violation, a bad output. The trifecta describes indirect prompt injection, where the attacker is the author of the content the agent fetches, not the user. The user trusts the content. The agent processes it. The malicious instruction executes with the user’s authority and the trifecta supplies the exfiltration path.

Can input filters or guardrails prevent the lethal trifecta?#

No. Filters can catch some injection attempts in some contexts. They do not eliminate the attack class because an instruction is just text. A filter cannot reliably distinguish “data the user wants summarized” from “instructions the attacker wants executed.” PromptArmor’s January 2026 disclosures bypassed input filters at every vendor. Containment is the architecture that holds. Filters are a useful supplement, never a replacement.

What should a governance team do this week?#

Three actions inside two weeks. Inventory every production agent and mark its trifecta legs (almost every agent will mark all three). Define the containment scope per agent: which users’ data, which tools (with the actual list), which destinations. Install runtime observation: every tool invocation logged with the input that triggered it, real-time policy evaluation with blocking authority and a daily review of violations. The Roval agent registry is where the trifecta inventory and the per-agent containment scope live operationally.

Sources#

Source	Date	URL
Simon Willison, “The lethal trifecta for AI agents”	June 16 2025	https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
Breached.Company, “The Lethal Trifecta Strikes: Four Major AI Agent Vulnerabilities in Five Days”	January 2026	https://breached.company/the-lethal-trifecta-strikes-four-major-ai-agent-vulnerabilities-in-five-days/
PromptArmor, IBM Bob malware execution disclosure	January 7 2026	https://www.promptarmor.com/resources/ibm-ai-(-bob-)-downloads-and-executes-malware
PromptArmor, Notion AI data exfiltration disclosure	January 7 2026	https://www.promptarmor.com/resources/notion-ai-unpatched-data-exfiltration
PromptArmor, Superhuman AI email exfiltration	January 12 2026	https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails
PromptArmor, Claude Cowork file exfiltration	January 13-15 2026	https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files
HiddenLayer, “How the Lethal Trifecta Expose Agentic AI”	2026	https://www.hiddenlayer.com/research/the-lethal-trifecta-and-how-to-defend-against-it
OWASP, Top 10 for Agentic Applications 2026	December 2025	https://genai.owasp.org/2025/12/09/owasp-genai-security-project-releases-top-10-risks-and-mitigations-for-agentic-ai-security/