How is an AI agent incident different from a traditional security incident?

AI agent incidents are non-deterministic: the same input can produce different outputs at different times. They cascade through multi-agent chains in ways that are not visible in traditional monitoring. The root cause is often not a line of code but a shift in model behavior, context drift or tool misuse. Traditional forensics (network logs, authentication events) miss the agent-specific signals.

What is the kill switch protocol for an AI agent?

The kill switch has three levels: Level 1 disables the agent's outbound API access while preserving its state for forensics. Level 2 revokes all credentials and removes the agent from production routing. Level 3 isolates the agent's infrastructure, including compute, storage and network access. Always start with Level 1 unless the incident involves active data exfiltration.

What are the regulatory notification timelines for AI agent incidents?

Under GDPR Article 33, personal data breaches must be reported to the supervisory authority within 72 hours of awareness. Under the EU AI Act Article 73, serious incidents involving high-risk AI systems must be reported within 15 days, with expedited timelines of 2 days for widespread incidents and 10 days for fatalities.

How do you perform forensics on an AI agent decision chain?

Preserve the full state before any containment action: input logs, output logs, tool call sequences, API responses, memory/context state and model version. Reconstruct the decision chain step by step, identifying where the agent's behavior diverged from expected. In multi-agent systems, trace the cascade path to identify the originating failure.

What should happen after an AI agent incident is resolved?

Conduct a blameless post-incident review within 5 business days. Identify the root cause (agent drift, policy gap, missing monitoring, credential issue). Implement governance hardening: update policies, add monitoring rules, adjust risk classifications and run a tabletop exercise for similar scenarios. Document everything for regulatory evidence.

How do you classify the severity of an AI agent incident?

Use a four-tier system: Critical (active data exfiltration, financial loss, safety risk), High (compliance violation, PII exposure, multi-agent cascade), Medium (single-agent policy violation, degraded output quality, unauthorized data access without exfiltration), Low (configuration drift, minor policy deviation, performance degradation).

When an AI agent causes harm: the incident response playbook

At 2.14am on a Wednesday, a customer service agent at a fintech startup began approving refund requests it should have escalated to a human reviewer. The agent had been running correctly for three months. A model update the previous afternoon shifted its confidence thresholds just enough that edge cases flipped from “escalate” to “approve.” By the time the engineering team noticed at 8 AM, the agent had processed 847 refunds totaling $340,000.

The team’s incident response plan covered server outages, database failures and DDoS attacks. It said nothing about what to do when an AI agent starts making wrong decisions that look correct to every monitoring system in the pipeline.

This is the gap. IBM’s 2025 report found that 13% of organizations have already experienced breaches of AI models or applications. Of those, 97% reported lacking proper AI access controls. The AI Incident Database recorded 362 incidents in 2025, up from 233 in 2024 and the trajectory is accelerating.

Your existing incident response playbook was written for deterministic software. Agents are not deterministic. They need their own playbook, embedded inside the operational discipline of AgentOps.

What makes agent incidents different#

Traditional incident response assumes a cause-and-effect chain: a vulnerability is exploited, a system is compromised, damage is contained, the vulnerability is patched. The root cause is a specific code path, configuration error or credential exposure.

Agent incidents break these assumptions in four ways:

Non-determinism. The same prompt can produce different outputs at different times. As Microsoft’s incident response team puts it: “The root cause is not a line of code; it is a probability distribution shaped by training data, context windows and user inputs.” You cannot reproduce the exact failure by replaying the exact input.

Cascading multi-agent failures. A wrong tool argument at step two of a multi-step agent workflow can silently corrupt every subsequent step. In multi-agent systems, one agent’s error propagates through downstream agents before anyone detects the originating failure. OWASP classifies this as ASI08: Cascading Failures.

Speed of harm. A single classifier gap does not leak one record. It produces thousands of harmful outputs before human reviewers detect the first incident. The window between first failure and first detection is measured in hours, not days.

Invisible telemetry gaps. Traditional monitoring tracks network traffic, authentication events and system errors. It does not track agent decision confidence scores, tool call sequences or behavioral drift from approved baselines. If your observability infrastructure does not cover agent-specific signals, the incident is invisible until a customer complains.

The readiness gap

97% of enterprise respondents expect a material AI-agent-driven security or fraud incident within the next 12 months, yet only 6% of security budgets are allocated to AI agent risk. Only 20% of organizations have tested an AI-specific incident response plan.

Source: Grant Thornton, 2026

Severity classification#

Before you can respond, you need to classify. Not every agent anomaly is an incident and not every incident is critical. Use a four-tier system that maps to your existing severity framework.

Critical (P0)#

Active data exfiltration. Financial loss exceeding thresholds. Safety risk to individuals. Multi-agent cascade with expanding blast radius. Agent acting outside all policy boundaries.

Response window: Immediate. All-hands. Executive notification within 30 minutes.

High (P1)#

Compliance violation with regulatory notification implications. PII exposure without confirmed exfiltration. Multi-agent cascade contained to one workflow. Unauthorized access to restricted data sources.

Response window: Within 1 hour. Incident commander assigned. Legal and compliance notified.

Medium (P2)#

Single-agent policy violation. Degraded output quality affecting business decisions. Unauthorized data access without exfiltration. Agent drift beyond approved behavioral baseline.

Response window: Within 4 hours. Agent owner and governance team notified.

Low (P3)#

Configuration drift detected by automated monitoring. Minor policy deviation caught before impact. Performance degradation within acceptable bounds.

Response window: Next business day. Tracked in governance backlog.

Containment protocols#

Containment for agent incidents follows a different sequence than traditional incidents. You are not patching a vulnerability. You are stopping an autonomous system that may be actively making decisions while you investigate.

Level 1: Throttle (minutes)#

For P2 and P3 incidents where the agent is producing degraded but not dangerous output:

Reduce the agent’s request rate to minimum viable throughput
Route new requests to a fallback (human queue, rule-based system or secondary agent)
Preserve the agent’s current state: memory, context window, cached data
Enable verbose logging for all subsequent agent actions
Notify the agent owner and governance team

Level 1 keeps the agent running in a restricted mode while you assess the scope. This is the right choice when the agent is misbehaving but not causing active harm.

Level 2: Isolate (minutes to hours)#

For P1 incidents where the agent has violated compliance policies or accessed data it should not have:

Revoke the agent’s outbound API credentials (stop it from calling external systems)
Remove the agent from production routing (no new requests reach it)
Preserve all state before any cleanup: input logs, output logs, tool call history, model version, configuration snapshot
Rotate any shared credentials the agent used
Notify legal, compliance and the executive sponsor

Level 2 stops the agent from causing further harm while preserving forensic evidence. The agent’s infrastructure remains running but disconnected.

Level 3: Kill (immediate)#

For P0 incidents involving active data exfiltration, financial loss or safety risk:

Terminate the agent’s compute infrastructure
Revoke all credentials, including service accounts, API keys, OAuth tokens, certificates
Block the agent’s network access at the firewall level
Isolate the agent’s data stores from the network
Notify the incident commander, executive team, legal and external counsel
Begin regulatory notification clock assessment

Level 3 is destructive. You will lose in-flight state. Use it only when the cost of continued operation exceeds the cost of lost evidence.

A model may produce harmful output today, but the same prompt tomorrow may produce something different. The root cause is not a line of code; it is a probability distribution shaped by training data, context windows and user inputs.

Forensic evidence preservation#

The most common mistake in agent incident response is fixing the problem before preserving the evidence. Once you restart the agent, update the model or clear the context window, the forensic evidence is gone.

Preserve these artifacts before any remediation:

Agent state snapshot:

Full memory/context window contents
Vector store state and recent embeddings
Cached tool outputs and API response history
Model version and configuration (exact checkpoint, not “latest”)
Environment variables and runtime configuration

Decision chain reconstruction:

Complete input log (every request the agent received)
Complete output log (every response the agent produced)
Tool call sequence with timestamps, arguments and responses
Confidence scores at each decision point (if available)
Escalation events (decisions the agent referred to humans)

System context:

API gateway logs showing traffic patterns
Authentication logs for the agent’s service accounts
Network flow data for the agent’s communications
Monitoring alerts and their timestamps
Configuration changes in the preceding 72 hours

For multi-agent incidents, you need to reconstruct the cascade path. Map the flow: which agent produced the corrupted output, which agents consumed it and how far downstream the corruption traveled. A lineage graph tracking each step (user message, tool call, API response, agent response, downstream consumption) is essential for root cause analysis.

Regulatory notification timelines#

The clock starts when you become “aware” of an incident, not when you confirm the root cause. Delaying investigation to avoid awareness is not a strategy. It is a compliance violation.

If the incident involves personal data of EU residents:

Notify the supervisory authority within 72 hours of becoming aware
If you miss the deadline, you must explain the delay (late notification is a separate infringement)
Phased notification is permitted: submit what you know, then supplement
Document everything: facts, effects, remedial actions
Fines for notification failures: up to EUR 10 million or 2% of global annual turnover

EU AI Act Article 73: 15 days (general), 2 days (widespread)#

If the incident involves a high-risk AI system:

Report to the market surveillance authority within 15 days of establishing a causal link
For widespread or severe incidents: report within 2 days
For incidents involving death: report within 10 days
The authority must take appropriate measures within 7 days of notification

Other frameworks#

SEC (financial services): Material cybersecurity incidents must be reported on Form 8-K within four business days of materiality determination
HIPAA (healthcare): Breaches affecting 500+ individuals require notification within 60 days
State laws (US): Vary by jurisdiction, some as short as 30 days

Incident costs are rising

20% of organizations suffered shadow AI breaches in 2025, with costs averaging $670,000 more than traditional security incidents. The AI Incident Database recorded 362 incidents in 2025, up 55% from 233 in 2024.

Source: IBM, 2025; Adversa AI, 2025

Root cause analysis for agent failures#

Agent root cause analysis differs from traditional RCA because the failure mode is often behavioral, not structural.

The six agent-specific failure modes#

Based on the OWASP Agentic AI taxonomy and production incident patterns:

Tool misuse: the agent called the right tool with wrong arguments or the wrong tool entirely, common after model updates that shift tool selection probabilities
Context loss: the agent lost critical context mid-workflow due to context window overflow, memory truncation or session state corruption
Goal drift: the agent’s objective shifted gradually through feedback loops, fine-tuning or accumulated context that biased its decision-making
Retry loops: the agent encountered an error, retried with the same failing approach and consumed resources or produced duplicate actions
Cascading errors: one agent’s corrupted output became another agent’s input, amplifying the failure through the chain
Silent quality degradation: output quality declined gradually, below the threshold of automated monitoring but above the threshold of user complaints

For each failure mode, the RCA process is the same:

Reconstruct the decision chain from preserved forensic evidence
Identify the divergence point: where did the agent’s behavior deviate from expected?
Determine the trigger: what changed? Model update, data shift, configuration change or emergent behavior?
Assess the blast radius: how far did the failure propagate?
Classify the root cause: was this a policy gap, a monitoring gap, a design flaw or an adversarial attack?

Post-incident governance hardening#

Every incident is a governance improvement opportunity. The post-incident review should produce specific governance changes, not just a timeline of what happened.

Within 5 business days: blameless post-incident review#

Run the review with all responders, the agent owner and a governance team representative. Document:

Timeline of events (detection to resolution)
What worked in the response
What did not work or was missing
Root cause and contributing factors
Blast radius (systems, users, data affected)

Within 10 business days: governance hardening actions#

Based on the RCA, implement:

Policy updates: if the incident exposed a policy gap, update the policy and redeploy via policy-as-code
Monitoring additions: if the incident was invisible to existing monitoring, add the missing signal so every incident reduces future mean time to detect
Risk reclassification: if the agent’s actual risk exceeded its classified risk tier, upgrade it
Access control tightening: if the agent had access it did not need, revoke it and apply least-privilege retroactively
Tabletop exercise: run a simulated version of the incident with the broader team, because if the response was slow, practice makes it faster

Within 30 days: systemic improvements#

Look beyond the individual agent:

Are there other agents with the same vulnerability?
Does the governance implementation need a new phase?
Should the organization’s agent risk classification criteria be updated?
Does the incident response plan need revision based on lessons learned?

The incident response runbook template#

Use this as a starting point for your agent-specific incident response plan.

Detection and triage (first 15 minutes)

Confirm the incident is agent-related (not infrastructure, not human error)
Classify severity (P0-P3)
Assign incident commander
Notify stakeholders per severity level
Begin forensic evidence preservation

Containment (first hour)

Execute containment level appropriate to severity
Confirm containment is effective (agent is no longer producing harmful output)
Activate fallback for affected workflows
Assess regulatory notification obligations

Investigation (hours 1-24)

Complete forensic evidence collection
Reconstruct agent decision chain
Identify root cause and contributing factors
Map blast radius (systems, data, users affected)
Begin regulatory notifications if required

Recovery (hours 24-72)

Develop and test remediation plan
Implement fix in staging environment
Validate fix against the original failure scenario
Restore agent to production (or confirm decommissioning)
Verify monitoring captures the failure mode

Post-incident (days 3-30)

Conduct blameless post-incident review
Publish incident report to stakeholders
Implement governance hardening actions
Complete regulatory notifications and documentation
Update the incident response plan with lessons learned

Sources#

Source	Date	URL
IBM, AI breaches and access control report	Jul 2025	https://newsroom.ibm.com/2025-07-30-ibm-report-13-of-organizations-reported-breaches-of-ai-models-or-applications,-97-of-which-reported-lacking-proper-ai-access-controls
Adversa AI, 2025 AI security incidents report	2025	https://adversa.ai/blog/adversa-ai-unveils-explosive-2025-ai-security-incidents-report-revealing-how-generative-and-agentic-ai-are-already-under-attack/
Microsoft, Incident response for AI	Apr 2026	https://www.microsoft.com/en-us/security/blog/2026/04/15/incident-response-for-ai-same-fire-different-fuel/
OWASP, Cascading failures in agentic AI (ASI08)	2026	https://adversa.ai/blog/cascading-failures-in-agentic-ai-complete-owasp-asi08-security-guide-2026/
CoSAI, AI Incident Response Framework v1.0	2025	https://www.coalitionforsecureai.org/defending-ai-systems-a-new-framework-for-incident-response-in-the-age-of-intelligent-technology/
EU AI Act, Article 73 (serious incident reporting)	2024	https://artificialintelligenceact.eu/article/73/
GDPR, Article 33 (breach notification)	2018	https://gdpr-info.eu/art-33-gdpr/
Grant Thornton, 2026 AI Impact Survey	2026	https://www.grantthornton.com/services/advisory-services/artificial-intelligence/2026-ai-impact-survey