---
title: "When an AI agent causes harm: the incident response playbook"
date: 2026-04-16
author: david
excerpt: "97% of enterprises expect a material AI agent security incident within the next 12 months. Only 20% have tested an incident response plan. This playbook covers the containment protocols, forensic procedures and regulatory notification timelines that your existing runbook does not."
category: operations
tags:
  - incident response
  - security
  - compliance
  - forensics
  - agent governance
draft: false
tldr: "Traditional incident response playbooks were written for deterministic software. AI agents are non-deterministic, multi-step and interconnected. When an agent causes harm, you need containment protocols that account for cascading failures, forensic procedures that can reconstruct agent decision chains and regulatory notification timelines that start ticking the moment you become aware. This playbook covers all three."
seo:
  title: "AI agent incident response playbook: containment, forensics, recovery"
  description: "A complete incident response playbook for AI agent failures covering severity classification, containment protocols, forensic evidence preservation, regulatory notification timelines and post-incident governance hardening."
faqs:
  - question: "How is an AI agent incident different from a traditional security incident?"
    answer: "AI agent incidents are non-deterministic: the same input can produce different outputs at different times. They cascade through multi-agent chains in ways that are not visible in traditional monitoring. The root cause is often not a line of code but a shift in model behavior, context drift or tool misuse. Traditional forensics (network logs, authentication events) miss the agent-specific signals."
  - question: "What is the kill switch protocol for an AI agent?"
    answer: "The kill switch has three levels: Level 1 disables the agent's outbound API access while preserving its state for forensics. Level 2 revokes all credentials and removes the agent from production routing. Level 3 isolates the agent's infrastructure, including compute, storage and network access. Always start with Level 1 unless the incident involves active data exfiltration."
  - question: "What are the regulatory notification timelines for AI agent incidents?"
    answer: "Under GDPR Article 33, personal data breaches must be reported to the supervisory authority within 72 hours of awareness. Under the EU AI Act Article 73, serious incidents involving high-risk AI systems must be reported within 15 days, with expedited timelines of 2 days for widespread incidents and 10 days for fatalities."
  - question: "How do you perform forensics on an AI agent decision chain?"
    answer: "Preserve the full state before any containment action: input logs, output logs, tool call sequences, API responses, memory/context state and model version. Reconstruct the decision chain step by step, identifying where the agent's behavior diverged from expected. In multi-agent systems, trace the cascade path to identify the originating failure."
  - question: "What should happen after an AI agent incident is resolved?"
    answer: "Conduct a blameless post-incident review within 5 business days. Identify the root cause (agent drift, policy gap, missing monitoring, credential issue). Implement governance hardening: update policies, add monitoring rules, adjust risk classifications and run a tabletop exercise for similar scenarios. Document everything for regulatory evidence."
  - question: "How do you classify the severity of an AI agent incident?"
    answer: "Use a four-tier system: Critical (active data exfiltration, financial loss, safety risk), High (compliance violation, PII exposure, multi-agent cascade), Medium (single-agent policy violation, degraded output quality, unauthorized data access without exfiltration), Low (configuration drift, minor policy deviation, performance degradation)."
---

At 2.14am on a Wednesday, a customer service agent at a fintech startup began approving refund requests it should have escalated to a human reviewer. The agent had been running correctly for three months. A model update the previous afternoon shifted its confidence thresholds just enough that edge cases flipped from "escalate" to "approve." By the time the engineering team noticed at 8 AM, the agent had processed 847 refunds totaling $340,000.

The team's incident response plan covered server outages, database failures and DDoS attacks. It said nothing about what to do when an AI agent starts making wrong decisions that look correct to every monitoring system in the pipeline.

This is the gap. [IBM's 2025 report](https://newsroom.ibm.com/2025-07-30-ibm-report-13-of-organizations-reported-breaches-of-ai-models-or-applications,-97-of-which-reported-lacking-proper-ai-access-controls) found that 13% of organizations have already experienced breaches of AI models or applications. Of those, 97% reported lacking proper AI access controls. The AI Incident Database recorded [362 incidents in 2025](https://adversa.ai/blog/adversa-ai-unveils-explosive-2025-ai-security-incidents-report-revealing-how-generative-and-agentic-ai-are-already-under-attack/), up from 233 in 2024 and the trajectory is accelerating.

Your existing incident response playbook was written for deterministic software. Agents are not deterministic. They need their own playbook.

## What makes agent incidents different

Traditional incident response assumes a cause-and-effect chain: a vulnerability is exploited, a system is compromised, damage is contained, the vulnerability is patched. The root cause is a specific code path, configuration error or credential exposure.

Agent incidents break these assumptions in four ways:

**Non-determinism.** The same prompt can produce different outputs at different times. As Microsoft's incident response team [puts it](https://www.microsoft.com/en-us/security/blog/2026/04/15/incident-response-for-ai-same-fire-different-fuel/): "The root cause is not a line of code; it is a probability distribution shaped by training data, context windows and user inputs." You cannot reproduce the exact failure by replaying the exact input.

**Cascading multi-agent failures.** A wrong tool argument at step two of a multi-step agent workflow can silently corrupt every subsequent step. In [multi-agent systems](/research/blog/multi-agent-governance), one agent's error propagates through downstream agents before anyone detects the originating failure. OWASP classifies this as [ASI08: Cascading Failures](https://adversa.ai/blog/cascading-failures-in-agentic-ai-complete-owasp-asi08-security-guide-2026/).

**Speed of harm.** A single classifier gap does not leak one record. It produces thousands of harmful outputs before human reviewers detect the first incident. The window between first failure and first detection is measured in hours, not days.

**Invisible telemetry gaps.** Traditional monitoring tracks network traffic, authentication events and system errors. It does not track agent decision confidence scores, tool call sequences or behavioral drift from approved baselines. If your [observability infrastructure](/platform/observer) does not cover agent-specific signals, the incident is invisible until a customer complains.

:::fact[The readiness gap]{description="97% expect an incident, only 20% have tested a response plan"}
97% of enterprise respondents expect a material AI-agent-driven security or fraud incident within the next 12 months, yet only 6% of security budgets are allocated to AI agent risk. Only 20% of organizations have tested an AI-specific incident response plan.

Source: [Grant Thornton, 2026](https://www.grantthornton.com/services/advisory-services/artificial-intelligence/2026-ai-impact-survey)
:::

## Severity classification

Before you can respond, you need to classify. Not every agent anomaly is an incident and not every incident is critical. Use a four-tier system that maps to your existing severity framework.

### Critical (P0)

Active data exfiltration. Financial loss exceeding thresholds. Safety risk to individuals. Multi-agent cascade with expanding blast radius. Agent acting outside all policy boundaries.

**Response window:** Immediate. All-hands. Executive notification within 30 minutes.

### High (P1)

Compliance violation with regulatory notification implications. PII exposure without confirmed exfiltration. [Multi-agent cascade](/research/blog/multi-agent-governance) contained to one workflow. Unauthorized access to restricted data sources.

**Response window:** Within 1 hour. Incident commander assigned. Legal and compliance notified.

### Medium (P2)

Single-agent policy violation. Degraded output quality affecting business decisions. Unauthorized data access without exfiltration. [Agent drift](/research/blog/agent-drift-continuous-compliance) beyond approved behavioral baseline.

**Response window:** Within 4 hours. Agent owner and governance team notified.

### Low (P3)

Configuration drift detected by automated monitoring. Minor policy deviation caught before impact. Performance degradation within acceptable bounds.

**Response window:** Next business day. Tracked in governance backlog.

## Containment protocols

Containment for agent incidents follows a different sequence than traditional incidents. You are not patching a vulnerability. You are stopping an autonomous system that may be actively making decisions while you investigate.

### Level 1: Throttle (minutes)

For P2 and P3 incidents where the agent is producing degraded but not dangerous output:

- Reduce the agent's request rate to minimum viable throughput
- Route new requests to a fallback (human queue, rule-based system or secondary agent)
- Preserve the agent's current state: memory, context window, cached data
- Enable verbose logging for all subsequent agent actions
- Notify the agent owner and governance team

Level 1 keeps the agent running in a restricted mode while you assess the scope. This is the right choice when the agent is misbehaving but not causing active harm.

### Level 2: Isolate (minutes to hours)

For P1 incidents where the agent has violated compliance policies or accessed data it should not have:

- Revoke the agent's outbound API credentials (stop it from calling external systems)
- Remove the agent from production routing (no new requests reach it)
- Preserve all state before any cleanup: input logs, output logs, tool call history, model version, configuration snapshot
- Rotate any shared credentials the agent used
- Notify legal, compliance and the executive sponsor

Level 2 stops the agent from causing further harm while preserving forensic evidence. The agent's infrastructure remains running but disconnected.

### Level 3: Kill (immediate)

For P0 incidents involving active data exfiltration, financial loss or safety risk:

- Terminate the agent's compute infrastructure
- Revoke all credentials, including service accounts, API keys, OAuth tokens, certificates
- Block the agent's network access at the firewall level
- Isolate the agent's data stores from the network
- Notify the incident commander, executive team, legal and external counsel
- Begin regulatory notification clock assessment

Level 3 is destructive. You will lose in-flight state. Use it only when the cost of continued operation exceeds the cost of lost evidence.

:::cite{name="Phillip Misner" title="Head of AI Incident Detection and Response, Microsoft" linkedin="https://www.linkedin.com/in/phillipmisner/"}
A model may produce harmful output today, but the same prompt tomorrow may produce something different. The root cause is not a line of code; it is a probability distribution shaped by training data, context windows and user inputs.
:::

## Forensic evidence preservation

The most common mistake in agent incident response is fixing the problem before preserving the evidence. Once you restart the agent, update the model or clear the context window, the forensic evidence is gone.

Preserve these artifacts before any remediation:

**Agent state snapshot:**
- Full memory/context window contents
- Vector store state and recent embeddings
- Cached tool outputs and API response history
- Model version and configuration (exact checkpoint, not "latest")
- Environment variables and runtime configuration

**Decision chain reconstruction:**
- Complete input log (every request the agent received)
- Complete output log (every response the agent produced)
- Tool call sequence with timestamps, arguments and responses
- Confidence scores at each decision point (if available)
- Escalation events (decisions the agent referred to humans)

**System context:**
- API gateway logs showing traffic patterns
- Authentication logs for the agent's service accounts
- Network flow data for the agent's communications
- Monitoring alerts and their timestamps
- Configuration changes in the preceding 72 hours

For [multi-agent incidents](/research/blog/multi-agent-governance), you need to reconstruct the cascade path. Map the flow: which agent produced the corrupted output, which agents consumed it and how far downstream the corruption traveled. A lineage graph tracking each step (user message, tool call, API response, agent response, downstream consumption) is essential for root cause analysis.

## Regulatory notification timelines

The clock starts when you become "aware" of an incident, not when you confirm the root cause. Delaying investigation to avoid awareness is not a strategy. It is a compliance violation.

### GDPR Article 33: 72 hours

If the incident involves personal data of EU residents:

- Notify the supervisory authority [within 72 hours](https://gdpr-info.eu/art-33-gdpr/) of becoming aware
- If you miss the deadline, you must explain the delay (late notification is a separate infringement)
- Phased notification is permitted: submit what you know, then supplement
- Document everything: facts, effects, remedial actions
- Fines for notification failures: up to EUR 10 million or 2% of global annual turnover

### EU AI Act Article 73: 15 days (general), 2 days (widespread)

If the incident involves a high-risk AI system:

- [Report to the market surveillance authority](https://artificialintelligenceact.eu/article/73/) within 15 days of establishing a causal link
- For widespread or severe incidents: report within 2 days
- For incidents involving death: report within 10 days
- The authority must take appropriate measures within 7 days of notification

### Other frameworks

- **SEC (financial services):** Material cybersecurity incidents must be reported on Form 8-K within four business days of materiality determination
- **HIPAA (healthcare):** Breaches affecting 500+ individuals require notification within 60 days
- **State laws (US):** Vary by jurisdiction, some as short as 30 days

:::fact[Incident costs are rising]{description="Shadow AI breaches cost $670,000 more than traditional incidents"}
20% of organizations suffered shadow AI breaches in 2025, with costs averaging $670,000 more than traditional security incidents. The AI Incident Database recorded 362 incidents in 2025, up 55% from 233 in 2024.

Source: [IBM, 2025](https://newsroom.ibm.com/2025-07-30-ibm-report-13-of-organizations-reported-breaches-of-ai-models-or-applications,-97-of-which-reported-lacking-proper-ai-access-controls); [Adversa AI, 2025](https://adversa.ai/blog/adversa-ai-unveils-explosive-2025-ai-security-incidents-report-revealing-how-generative-and-agentic-ai-are-already-under-attack/)
:::

## Root cause analysis for agent failures

Agent root cause analysis differs from traditional RCA because the failure mode is often behavioral, not structural.

### The six agent-specific failure modes

Based on the [OWASP Agentic AI taxonomy](https://adversa.ai/blog/cascading-failures-in-agentic-ai-complete-owasp-asi08-security-guide-2026/) and production incident patterns:

- **Tool misuse:** the agent called the right tool with wrong arguments or the wrong tool entirely, common after model updates that shift tool selection probabilities
- **Context loss:** the agent lost critical context mid-workflow due to context window overflow, memory truncation or session state corruption
- **Goal drift:** the agent's objective shifted gradually through feedback loops, fine-tuning or accumulated context that biased its decision-making
- **Retry loops:** the agent encountered an error, retried with the same failing approach and consumed resources or produced duplicate actions
- **Cascading errors:** one agent's corrupted output became another agent's input, amplifying the failure through the chain
- **Silent quality degradation:** output quality declined gradually, below the threshold of automated monitoring but above the threshold of user complaints

For each failure mode, the RCA process is the same:

- Reconstruct the decision chain from preserved forensic evidence
- Identify the divergence point: where did the agent's behavior deviate from expected?
- Determine the trigger: what changed? Model update, data shift, configuration change or emergent behavior?
- Assess the blast radius: how far did the failure propagate?
- Classify the root cause: was this a [policy gap](/research/blog/policy-as-code-ai-agents), a monitoring gap, a design flaw or an adversarial attack?

## Post-incident governance hardening

Every incident is a governance improvement opportunity. The post-incident review should produce specific governance changes, not just a timeline of what happened.

### Within 5 business days: blameless post-incident review

Run the review with all responders, the agent owner and a governance team representative. Document:

- Timeline of events (detection to resolution)
- What worked in the response
- What did not work or was missing
- Root cause and contributing factors
- Blast radius (systems, users, data affected)

### Within 10 business days: governance hardening actions

Based on the RCA, implement:

- **Policy updates:** if the incident exposed a policy gap, update the policy and redeploy via [policy-as-code](/research/blog/policy-as-code-ai-agents)
- **Monitoring additions:** if the incident was invisible to existing monitoring, add the missing signal so every incident reduces future mean time to detect
- **Risk reclassification:** if the agent's actual risk exceeded its classified [risk tier](/research/blog/ai-agent-risk-classification), upgrade it
- **Access control tightening:** if the agent had access it did not need, revoke it and apply least-privilege retroactively
- **Tabletop exercise:** run a simulated version of the incident with the broader team, because if the response was slow, practice makes it faster

### Within 30 days: systemic improvements

Look beyond the individual agent:

- Are there other agents with the same vulnerability?
- Does the [governance implementation](/research/blog/agent-governance-implementation-playbook) need a new phase?
- Should the organization's agent [risk classification](/research/blog/ai-agent-risk-classification) criteria be updated?
- Does the incident response plan need revision based on lessons learned?

## The incident response runbook template

Use this as a starting point for your agent-specific incident response plan.

**Detection and triage (first 15 minutes)**
- [ ] Confirm the incident is agent-related (not infrastructure, not human error)
- [ ] Classify severity (P0-P3)
- [ ] Assign incident commander
- [ ] Notify stakeholders per severity level
- [ ] Begin forensic evidence preservation

**Containment (first hour)**
- [ ] Execute containment level appropriate to severity
- [ ] Confirm containment is effective (agent is no longer producing harmful output)
- [ ] Activate fallback for affected workflows
- [ ] Assess regulatory notification obligations

**Investigation (hours 1-24)**
- [ ] Complete forensic evidence collection
- [ ] Reconstruct agent decision chain
- [ ] Identify root cause and contributing factors
- [ ] Map blast radius (systems, data, users affected)
- [ ] Begin regulatory notifications if required

**Recovery (hours 24-72)**
- [ ] Develop and test remediation plan
- [ ] Implement fix in staging environment
- [ ] Validate fix against the original failure scenario
- [ ] Restore agent to production (or confirm decommissioning)
- [ ] Verify monitoring captures the failure mode

**Post-incident (days 3-30)**
- [ ] Conduct blameless post-incident review
- [ ] Publish incident report to stakeholders
- [ ] Implement governance hardening actions
- [ ] Complete regulatory notifications and documentation
- [ ] Update the incident response plan with lessons learned

:::cta{title="Close the detection gap" description="Roval Observer monitors agent behavior in real time, detects policy violations within minutes and triggers containment workflows automatically. Stop finding out about agent incidents from your customers." cta="Book a demo" href="https://roval.ai/demo"}
:::

## Sources

| Source | Date | URL |
|--------|------|-----|
| IBM, AI breaches and access control report | Jul 2025 | https://newsroom.ibm.com/2025-07-30-ibm-report-13-of-organizations-reported-breaches-of-ai-models-or-applications,-97-of-which-reported-lacking-proper-ai-access-controls |
| Adversa AI, 2025 AI security incidents report | 2025 | https://adversa.ai/blog/adversa-ai-unveils-explosive-2025-ai-security-incidents-report-revealing-how-generative-and-agentic-ai-are-already-under-attack/ |
| Microsoft, Incident response for AI | Apr 2026 | https://www.microsoft.com/en-us/security/blog/2026/04/15/incident-response-for-ai-same-fire-different-fuel/ |
| OWASP, Cascading failures in agentic AI (ASI08) | 2026 | https://adversa.ai/blog/cascading-failures-in-agentic-ai-complete-owasp-asi08-security-guide-2026/ |
| CoSAI, AI Incident Response Framework v1.0 | 2025 | https://www.coalitionforsecureai.org/defending-ai-systems-a-new-framework-for-incident-response-in-the-age-of-intelligent-technology/ |
| EU AI Act, Article 73 (serious incident reporting) | 2024 | https://artificialintelligenceact.eu/article/73/ |
| GDPR, Article 33 (breach notification) | 2018 | https://gdpr-info.eu/art-33-gdpr/ |
| Grant Thornton, 2026 AI Impact Survey | 2026 | https://www.grantthornton.com/services/advisory-services/artificial-intelligence/2026-ai-impact-survey |