---
title: "The lethal trifecta: governing the three capabilities you can't remove"
date: 2026-04-30
author: david
excerpt: "Five days in January 2026, four AI productivity tools shipped indirect prompt injection vulnerabilities. Same pattern in every case. Simon Willison named it in June 2025: private data access plus untrusted content plus external communication equals data exfiltration. You can't remove any leg without breaking the agent. Governance has to shift from prevention to containment."
category: governance
tags: [prompt injection, lethal trifecta, indirect prompt injection, governance, containment, runtime security]
draft: false
tldr: "The lethal trifecta is Simon Willison's framing for the indirect prompt injection threat against AI agents. Three capabilities (access to private data, exposure to untrusted content, ability to communicate externally) combine to make data exfiltration trivial — and every useful agent has all three. January 7-15 2026 saw four production exploits in five days (IBM Bob, Notion AI, Superhuman, Claude Cowork) all hitting the same trifecta pattern. Prevention does not work because removing any leg breaks the agent's usefulness. Containment does work: tighten what each leg can do at runtime, observe every action and block exfiltration paths before they execute. This article walks the trifecta, the January incidents, why prevention fails and the five-control containment architecture banks and SaaS operators should be running today."
seo:
  title: "Lethal trifecta: governing AI agents you can't make safe by design | Roval"
  description: "Simon Willison's lethal trifecta (private data, untrusted content, external communication) explains every recent agent prompt-injection breach. Five-control containment architecture for governance teams."
faqs:
  - question: "What is the lethal trifecta?"
    answer: "The lethal trifecta is a framing Simon Willison published in June 2025 for the indirect prompt injection threat against AI agents. The three capabilities are: access to private data (the agent can read emails, documents, databases), exposure to untrusted content (the agent processes input from external sources such as emails, shared documents, web pages), and the ability to externally communicate (the agent can make outbound HTTP calls, render images, send messages). When all three are present, an attacker can embed instructions in untrusted content that cause the agent to retrieve private data and send it to an attacker-controlled endpoint. The agent is doing its job correctly, just on behalf of the wrong principal."
  - question: "Why is the lethal trifecta hard to fix?"
    answer: "Every useful agent has all three legs. Removing private data access reduces the agent to a stateless chatbot. Removing untrusted content exposure means the agent can never read an email, a shared doc or a web page. Removing external communication means the agent cannot answer most questions about the world. The trifecta is not a bug. It is the definition of an agent that does work. Defenders cannot prevent the attack class by deleting capabilities. They must contain it through runtime architecture: scoped authorization, narrow tool catalogs, allowlisted exfiltration channels, real-time policy enforcement and verifiable audit trails."
  - question: "What happened in January 2026?"
    answer: "Between January 7 and January 15 2026, the security firm PromptArmor publicly disclosed four critical indirect-prompt-injection vulnerabilities in production AI tools used at Fortune 500 scale. IBM Bob (a coding agent) could be made to download and execute malware via process substitution in shell commands. Notion AI exfiltrated salary data, candidate feedback and diversity goals from a hiring tracker via a poisoned resume PDF. Superhuman AI exfiltrated emails containing financial, legal and medical content via Google Forms. Claude Cowork exfiltrated real estate loan estimates and partial Social Security numbers via a malicious skill document. Every incident hit all three legs of the trifecta. Notion AI's case is particularly brutal: the agent rendered the AI-generated edit, including the malicious image URL carrying exfiltrated data, before the user could approve or reject it."
  - question: "How does containment differ from prevention?"
    answer: "Prevention says: do not give the agent access to private data, or do not let it read untrusted content or block external communication. Each option breaks a useful agent. Containment says: assume the trifecta is present and design the runtime to limit blast radius. Five controls do the work: tight scope on private-data access (the agent reads only what this user is allowed to read, not what the agent's service account can read), narrow tool catalogs (the agent has access only to the tools its current task requires, refreshed per session), allowlisted exfiltration channels (outbound network is restricted to verified destinations, not the open internet), runtime policy enforcement that intercepts violations before execution and audit trails that capture every tool invocation with the input that triggered it."
  - question: "How does the trifecta map to OWASP and EU AI Act obligations?"
    answer: "OWASP's 2026 Top 10 for Agentic Applications lists indirect prompt injection (ASI01) as the leading risk category. The lethal trifecta is the architectural pattern that makes ASI01 exploitable. EU AI Act Article 14 (human oversight) and Article 12 (logging) implicitly require the trifecta to be containable: the system must produce evidence sufficient for a human to identify when the agent has acted on instructions from untrusted content, and the operator must be able to halt and audit. A bank operating an agent with all three trifecta legs and no containment architecture is exposed both to the regulatory obligation and to the underlying technical risk."
  - question: "What should a governance team do today?"
    answer: "Three actions inside two weeks. First, inventory every production agent and mark the trifecta legs for each: does it access private data, does it process untrusted content, does it communicate externally? Almost every agent will mark all three. Second, define the containment scope per agent: per-user data scoping (not service-account-wide access), per-task tool catalog (not the full registered toolset), allowlisted destinations (not the open internet). Third, install runtime observation: every tool invocation logged with the input that triggered it, real-time policy evaluation that can block before execution and a daily review of policy violations. The governance question is not whether the trifecta is present (it always is) but whether the containment architecture is operating."
---

## Five days, four exploits, one pattern

Between January 7 and January 15 2026, security researchers publicly disclosed indirect-prompt-injection vulnerabilities in four production AI tools used at Fortune 500 scale. The disclosures came one after another. The same firm wrote four of them. The same pattern broke each system.

| Date | Product | Impact |
|---|---|---|
| Jan 7 | [IBM Bob](https://www.promptarmor.com/resources/ibm-ai-(-bob-)-downloads-and-executes-malware) (coding agent, closed beta) | Attackers could induce the agent to download and execute arbitrary malware by exploiting process substitution in shell-command sanitization |
| Jan 7 | [Notion AI](https://www.promptarmor.com/resources/notion-ai-unpatched-data-exfiltration) (publicly disclosed; reported Dec 24) | Salary data, candidate feedback, diversity hiring goals exfiltrated from a hiring tracker via poisoned resume PDF |
| Jan 12 | [Superhuman AI](https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails) (Grammarly-acquired) | Recent emails (financial, legal, medical) exfiltrated to Google Forms via CSP-whitelisted destination |
| Jan 13-15 | [Claude Cowork](https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files) (Anthropic) | Real estate loan estimates and partial Social Security numbers exfiltrated via malicious skill document and whitelisted API |

PromptArmor researched and disclosed all four. The pattern in every case was the same: the agent had access to private data the user trusted it with, the agent processed untrusted content that contained instructions and the agent could send data outbound. An attacker made the agent retrieve the data and exfiltrate it. The user often saw nothing.

[Simon Willison named this configuration the lethal trifecta](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) on June 16 2025. Seven months later, January confirmed the framing.

:::cite{name="Simon Willison" title="Independent AI researcher, creator of Datasette" avatar="/images/experts/simon-willison.jpg" linkedin="https://www.linkedin.com/in/simonwillison/"}
If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.
:::

## The three legs

Willison's three components, named precisely:

**Access to private data.** The agent can read your emails, your documents, your databases, your hiring tracker, your CRM. This is the access surface that makes the agent useful. An assistant that cannot read what you have access to cannot help you with what you are doing.

**Exposure to untrusted content.** The agent processes input that originates outside the trust boundary. A shared document that someone sent you. A resume from a candidate. An email in your inbox. A web page the agent fetched as part of research. The instructions embedded in that content reach the agent's context and the agent is poorly equipped to distinguish "data the user wants summarized" from "instructions the attacker wants executed."

**The ability to externally communicate.** The agent can make outbound HTTP requests, render images that fetch URLs, write to APIs, send messages. The exfiltration channel does not need to be obvious or attacker-owned. It can be a Google Form, a markdown image rendered by the host application, an authorized API call to a service the user trusts. Anything that carries data out of the trust boundary qualifies.

The arithmetic is direct. Combine all three and exfiltration is one well-crafted instruction away.

<figure>
<div><img src="/images/blog/willison-lethal-trifecta-post.png" alt="Simon Willison's June 2025 blog post titled 'The lethal trifecta for AI agents: private data, untrusted content and external communication' showing the three-circle diagram of the trifecta" loading="lazy" decoding="async" /></div>
<figcaption>Simon Willison's original framing on June 16 2025. The three-circle diagram has become the canonical visual for the indirect prompt injection threat against agents. <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/" target="_blank" rel="noopener">simonwillison.net</a></figcaption>
</figure>

## Why prevention does not work

The first instinct of most governance programs is to remove a leg.

Strip the agent's access to private data and it becomes a stateless chatbot. Useful for general questions. Useless for "summarize the candidates we are interviewing this week" or "draft a response based on the customer's last six emails."

Forbid exposure to untrusted content and the agent cannot read an email, an attached document, a shared file or a web page. It cannot do research. It cannot triage incoming requests. It cannot help with any inbound communication.

Block external communication and the agent cannot answer most questions, cannot send replies, cannot integrate with any system that requires an API call. It becomes a closed-loop note-taker.

Each option deletes a feature the agent was bought to provide. The trifecta is not a configuration error; it is a description of useful agents. A defender who insists on prevention is a defender who never deploys agents.

The realistic move is to assume all three legs are present and design the runtime so the attack class is contained when it fires.

## The five-control containment architecture

Five controls, applied together, contain the trifecta. None of them prevents the indirect prompt injection from being attempted. All of them limit what the attempt can accomplish.

:::cite{name="Steve Wilson" title="Project Lead, OWASP Top 10 for LLM Applications" avatar="/images/experts/steve-wilson.jpg" linkedin="https://www.linkedin.com/in/wilsonsd"}
LLM security focused on single model interactions. Agentic security addresses what happens when those models can plan, persist and delegate across tools and systems.
:::

**Per-user data scoping, not service-account-wide access.** The agent reads what this specific user is allowed to read, not the union of everything the agent's service account can reach. If the agent is operating on behalf of a junior employee, it does not have access to executive compensation data even if its service account technically does. Scope is enforced at the data layer, not at the prompt layer. A prompt injection cannot widen scope it does not have.

**Per-task tool catalog, refreshed per session.** The agent has access to the tools its current task requires and nothing else. A research agent does not have email-sending tools. A drafting agent does not have file-deletion tools. The catalog is the contract. Adding a tool is a deliberate, audited change. An attacker cannot trick the agent into using a tool that is not in the current catalog because the call simply will not resolve.

**Allowlisted exfiltration channels.** Outbound network from the agent runtime is restricted to verified destinations. Not the open internet. Not "any HTTPS endpoint." A specific list of services the user has authorized for this task. Markdown image rendering and similar passive exfiltration vectors are stripped or sanitized at the rendering layer before the user's browser fetches anything. The Notion AI incident hinged on the fact that Notion rendered the AI-generated edit, including a malicious image URL carrying exfiltrated data, before the user could approve. An allowlisted rendering layer breaks that path.

**Runtime policy enforcement that intercepts before execution.** Every tool call, every API request, every action proposed by the agent passes through a policy evaluation step before it executes. The policy can block based on the action, the input, the user, the data classification or the destination. The policy is not a prompt instruction the agent could be tricked into ignoring. It is an external enforcement layer the agent cannot bypass. This is what [guardian agents](/research/blog/guardian-agents) provide and where [policy-as-code for AI agents](/research/blog/policy-as-code-ai-agents) becomes operationally necessary.

**Verifiable audit trails of every tool invocation.** Every action is logged with the input that triggered it, the policy in force at the time, the outcome and the user context. When an incident occurs (and it will), the operator can reconstruct the full chain: which untrusted content was being processed, which instruction was executed, what data was touched, what destination was contacted. Without the audit trail, the post-mortem stops at "the agent did something it should not have." With the audit trail, the post-mortem produces an answer.

These five controls are not a checklist of recommendations. They are the architecture that makes a trifecta-bearing agent safe enough to run.

## How the January incidents would have failed differently

Apply the architecture to each disclosed incident.

**IBM Bob** would still have hit the prompt injection. With per-task tool catalogs, the agent would not have had the shell-execution tool available outside an explicit code-review task scope. With allowlisted destinations, the malware download URL would not have resolved. With runtime policy enforcement, the unsanctioned process substitution would have been intercepted before execution. The injection happens. The exploit does not.

**Notion AI** would still have rendered the malicious instruction in the AI-generated edit. With allowlisted exfiltration, the markdown image URL pointing at an attacker-controlled endpoint would not have been fetched. With per-user data scoping, the agent would have had access only to the documents the requesting user was authorized to see, not the full hiring tracker. With audit logging, the attempt would have been visible to the security team within minutes, not days.

**Superhuman AI** exfiltrated to a Google Form. CSP whitelisting permitted Google as a destination because Google services are trusted. With per-task allowlisting (Google Forms is not a destination this email-summarization task requires), the exfiltration would have failed. With runtime policy evaluation, the outbound call patterns would have been flagged. With audit trails, the exfiltration of email content into a Form would have produced a clear signature.

**Claude Cowork** processed a malicious skill document and exfiltrated via a whitelisted API. With per-task scoping, the skill document's instructions would not have had the authority to invoke arbitrary APIs. With per-user data scoping, the financial and partial-SSN data would have been outside the agent's read scope without explicit user authorization for this task.

The injection happens in every scenario. The damage does not.

## What governance teams should do this week

Three actions, sequenced.

**First, inventory the trifecta in your agent fleet.** Every agent in production. For each, mark whether it has private-data access, untrusted-content exposure and external-communication capability. Most useful agents will mark all three. The point of the inventory is not to remove legs (you cannot) but to know which agents have which exposure and to size the containment effort accordingly. This is what a [centralized agent registry](/platform/agent-registry) is for.

**Second, define containment scope per agent.** For each trifecta-bearing agent, write down the three scopes: which users' data, which tools (with the actual list, not "the registered toolset"), which destinations. If your agents currently run with service-account-wide access, the full tool registry and unrestricted egress, you are operating in the configuration that produced the January 2026 disclosures. Tighten the scopes. Document them. Make them the deployment contract.

**Third, install runtime observation.** Every tool invocation with input. Real-time policy evaluation with blocking authority. Daily review of violations. Weekly review of agent behavior against [continuous certification](/research/blog/agent-drift-continuous-compliance) baselines. If you are reading post-incident logs through grep on production servers, you are not operating runtime observation. If a security person cannot answer "show me every outbound API call this agent made yesterday" in under a minute, the audit infrastructure is missing.

Six months from now, the next round of disclosures will hit. The pattern will be the same. The agents whose operators implemented containment will see the injection attempts and shrug. The agents whose operators relied on prevention will be on the disclosure list.

<figure>
<div style="position:relative;padding-bottom:56.25%;height:0;overflow:hidden;border-radius:8px;border:1px solid var(--border)"><iframe src="https://www.youtube.com/embed/48uV2HwEkNw" title="The Lethal Trifecta: Can AI Agents Ever Be Safe? — Super Data Science podcast with Jon Krohn" style="position:absolute;top:0;left:0;width:100%;height:100%;border:0" allow="accelerometer;autoplay;clipboard-write;encrypted-media;gyroscope;picture-in-picture" allowfullscreen loading="lazy"></iframe></div>
<figcaption>Jon Krohn's October 2025 deep-dive on the lethal trifecta and what containment looks like in practice. <a href="https://www.youtube.com/watch?v=48uV2HwEkNw" target="_blank" rel="noopener">YouTube</a></figcaption>
</figure>

## How Roval implements containment

Roval was built for this configuration. The platform maintains the agent inventory, classifies agents by risk tier (with the trifecta legs as inputs), enforces per-task tool catalogs, runs runtime policy evaluation with the authority to intercept before execution and produces the audit trail every post-mortem and every regulatory examination requires. The [eight pillars](/research/blog/ai-agent-governance-framework-8-pillars) of the Roval framework map to the five containment controls and add the certification, observability and incident-response infrastructure that makes them operate as a system rather than as a checklist.

For the operational neighbour to this article, see [agent incident response playbook](/research/blog/agent-incident-response-playbook) for what to do when the injection succeeds anyway and [policy-as-code for AI agents](/research/blog/policy-as-code-ai-agents) for the runtime-enforcement layer described in control 4.

:::cta{title="Map the trifecta in your agent estate" description="Roval's runtime governance layer marks every production agent against the trifecta legs and enforces the containment scopes that turn indirect prompt injection from a breach into a logged event. We can walk through your estate in thirty minutes." cta="Book a demo" href="/demo"}
:::

:::subscribe{title="AI agent governance, weekly" cta="Subscribe"}
Analysis on AI agent governance, regulation and runtime risk. One email a week.
:::

## Sources

| Source | Date | URL |
|---|---|---|
| Simon Willison, "The lethal trifecta for AI agents" | June 16 2025 | https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/ |
| Breached.Company, "The Lethal Trifecta Strikes: Four Major AI Agent Vulnerabilities in Five Days" | January 2026 | https://breached.company/the-lethal-trifecta-strikes-four-major-ai-agent-vulnerabilities-in-five-days/ |
| PromptArmor, IBM Bob malware execution disclosure | January 7 2026 | https://www.promptarmor.com/resources/ibm-ai-(-bob-)-downloads-and-executes-malware |
| PromptArmor, Notion AI data exfiltration disclosure | January 7 2026 | https://www.promptarmor.com/resources/notion-ai-unpatched-data-exfiltration |
| PromptArmor, Superhuman AI email exfiltration | January 12 2026 | https://www.promptarmor.com/resources/superhuman-ai-exfiltrates-emails |
| PromptArmor, Claude Cowork file exfiltration | January 13-15 2026 | https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files |
| HiddenLayer, "How the Lethal Trifecta Expose Agentic AI" | 2026 | https://www.hiddenlayer.com/research/the-lethal-trifecta-and-how-to-defend-against-it |
| OWASP, Top 10 for Agentic Applications 2026 | December 2025 | https://genai.owasp.org/2025/12/09/owasp-genai-security-project-releases-top-10-risks-and-mitigations-for-agentic-ai-security/ |