Does SOC 2 specifically address AI agents?

Not yet. The AICPA's Trust Service Criteria were last revised in 2022 and don't mention AI agents explicitly. But auditors are applying the existing criteria (particularly CC6.1 (logical access), CC7.2 (monitoring), PI1.1 (processing integrity) and CC8.1 (change management)) to AI systems with increasing specificity. The framework is flexible enough to cover agents; the controls just need to be adapted.

Which Trust Service Criteria are most relevant for AI agents?

Security (CC series) is mandatory and covers agent identity, access controls, monitoring and sub-processor management. Processing Integrity (PI series) is the most challenging for agents because of non-deterministic outputs. Confidentiality (C series) applies when agents access sensitive data. Privacy (P series) applies when agents process personal information. Availability (A series) covers failover, kill switches and capacity.

How much does SOC 2 compliance cost for AI systems?

Ranges vary significantly. Readiness assessments typically cost $5,000-$25,000, infrastructure remediation $10,000-$100,000+ and external audit fees $20,000-$100,000+ depending on scope and complexity. A purpose-built governance platform reduces the infrastructure and evidence-collection costs substantially.

How do I handle LLM providers as sub-processors?

Under CC6.6, document each LLM provider as a sub-processor. Review their SOC 2 reports annually. Ensure data processing agreements cover your use cases, particularly whether prompts containing PII are logged, retained or used for training. Maintain a vendor risk register that tracks their compliance status.

What's the difference between SOC 2 Type I and Type II for AI?

Type I tests control design at a point in time. Type II tests operational effectiveness over 6-12 months. For AI agents (where behavior changes continuously), Type II is far more demanding because it requires evidence of continuous monitoring, drift detection and governance throughout the observation period.

Where does Roval fit?

Roval provides the infrastructure that generates SOC 2 audit evidence as a byproduct of agent governance. The agent registry satisfies CC6.1 (identity and access). The LLM proxy satisfies CC7.2 (monitoring) and C1.2 (confidentiality). Compliance certification with auto-expiry satisfies CC4.1 (assessment). Drift detection every 15 minutes satisfies CC8.1 (change management). The immutable audit trail satisfies CC7.1 (logging). The entire system exports as CSV/JSON for any auditor.

Does SOC 2 require AI-specific controls in 2026?

Not as written. The 2017 Trust Service Criteria with 2022 points of focus do not name AI, agents or large language models. But auditors now apply CC6.1, CC6.6, CC7.2, CC8.1, PI1.1 and C1.2 to agents in scope with specific expectations: per-agent identity, sub-processor management for LLM providers, prompt and completion logs, drift detection and PII handling rules. The criteria are unchanged. The evidence bar is much higher.

How does a SOC 2 audit for AI agents differ from a traditional SaaS audit?

Three things change. Identity extends from human users to autonomous agents, with each agent needing its own credential and tool-permission scope. Processing integrity must be expressed in probabilistic terms (accuracy thresholds, hallucination detection) rather than deterministic input-output mappings. And monitoring must cover machine-speed actions, which means logging every prompt, every completion and every policy violation, not only user-initiated events.

SOC 2 for AI agents: what your auditor will actually ask

Your next SOC 2 audit will include AI#

SOC 2 audits are changing. Not because the AICPA rewrote the Trust Service Criteria. The 2017 TSC with 2022 revised points of focus remain the governing standard. But because auditors are now applying those criteria to a category of system the framework was never designed for: autonomous AI agents.

The shift is already measurable. SOC 2 benchmark data shows that reports with more than 150 security controls rose from 16% to 23% in the past year. Confidentiality is now included in 64.4% of SOC 2 reports, up from 34% in 2023. Availability appears in 75.3%. The scope is expanding because the systems in scope are expanding and AI agents are the newest addition.

As one practitioner-focused analysis put it: “Your auditor is going to ask about it. They will ask how you version your models, how you test for bias, what happens when the model hallucinates and how you prove that a probabilistic system processes data with ‘integrity.’”

BeyondScale: SOC 2 compliance for AI systems, what your auditor asks — BeyondScale's practitioner guide to SOC 2 compliance for AI systems | Source

If your organization deploys AI agents that access customer data, execute autonomous actions or connect to external systems, those agents are in your SOC 2 scope and the five Trust Service Criteria apply to them in ways that require agent-specific controls your existing program almost certainly doesn’t cover.

This article maps each Trust Service Criterion to the specific controls AI agents require, identifies the questions your auditor will ask and provides a 25-control checklist you can use to prepare. The downloadable PDF version (below) is designed as a pre-audit worksheet.

What SOC 2 says about AI agents now#

The 2017 Trust Service Criteria predate generative AI. They do not name large language models, autonomous agents, prompt injection or any AI-specific control. That gap is closing through two mechanisms.

First, the AICPA published examination guidance for AI systems and activities, giving organizations a standalone AI attestation option that runs alongside SOC 2. Some buyers in financial services and healthcare have started asking for it as a separate report on top of the SOC 2 Type II.

Second, audit firms now publish AI-specific control matrices that map back to the existing CC, PI, C, A and P criteria. The AICPA itself has not amended the criteria. Your auditor still tests CC6.1, CC7.2, PI1.1 and the rest. What changes is how much agent-specific evidence they expect to see for each one.

Three categories of agent-specific evidence have become standard expectations in 2026 audits:

Agent inventory with per-agent identity, owner, risk tier and certification status.
Prompt and completion logs with timestamps, model version, user identity and PII handling notes.
Drift evidence with a documented detection cadence and a sample of caught deviations across the observation period.

If you cannot produce all three on demand, your CC6.1, CC7.2 and CC8.1 controls are at risk regardless of how careful the rest of your program is. Most SOC 2 programs designed for SaaS infrastructure surface none of them, which is why governance teams now treat agent compliance certification as a separate workstream from the traditional security review.

Why SOC 2 breaks for agents#

SOC 2 was designed for a world of deterministic systems with static controls and human-mediated access. AI agents violate all three assumptions.

Assumption 1: Controls can be documented and tested at a point in time. Traditional controls (a firewall rule, an access policy, an encryption configuration) remain effective until someone intentionally changes them. Agent behavior changes without anyone changing the agent. The underlying model updates. The data distribution shifts. A tool the agent depends on modifies its API. An agent that was compliant last month may not be compliant today and nobody touched it.

Assumption 2: Systems process information “as intended” based on documented logic. Traditional software follows deterministic logic: given input X, produce output Y. AI agents produce non-deterministic outputs that vary with context and they exhibit emergent behavior, actions arising from complex interactions rather than explicit programming. You can’t document the agent’s “intended” processing logic the way you document a database query, because the agent’s behavior is shaped by its model, its prompt, its tools and the specific data it encounters at runtime.

SOC 2 reports with more than 150 security controls rose from 16% to 23% in the past year. Confidentiality inclusion nearly doubled from 34% in 2023 to 64.4% today. Yet 97% of organizations that suffered AI-related breaches lacked proper access controls, and 33% lack audit trails entirely.

Assumption 3: Access is human-mediated. Traditional SOC 2 controls assume humans access systems and controls govern that human access. Agents access systems autonomously (calling APIs, querying databases, executing code and interacting with other agents) without a human initiating each action. The access control model must extend to non-human autonomous identities and the monitoring must cover actions that happen at machine speed.

These aren’t theoretical gaps. 97% of organizations that suffered AI-related breaches lacked proper access controls. 33% of organizations lack audit trails entirely and 61% have fragmented logs across systems, meaning the evidence your auditor needs doesn’t exist in a queryable form.

The OWASP Top 10 for Agentic Applications identifies excessive agency, improper output handling and insecure tool integration as top-tier risks, all of which map directly to SOC 2 Trust Service Criteria. If your agents can take autonomous actions, those actions need the same level of access control, monitoring and audit logging you apply to human users.

The five trust service criteria applied to AI agents#

Security (Common Criteria, mandatory)#

Security is the only mandatory TSC in every SOC 2 audit. For AI agents, the security criteria extend beyond traditional infrastructure controls to cover agent-specific access patterns.

What your auditor will ask:

How do you authenticate agents to LLM APIs and external services? (CC6.1)
How do you enforce least-privilege access for each agent’s tool permissions? (CC6.3)
How do you manage API keys (rotation schedule, storage, scoping)? (CC6.1)
How do you monitor for prompt injection and data exfiltration attempts? (CC7.2)
How do you manage LLM providers as sub-processors? (CC6.6)

The controls you need:

Teleport: How AI agents impact SOC 2 Trust Services Criteria — Teleport's guide to how AI agents impact each SOC 2 Trust Service Criterion | Source

Every agent must have a unique, verifiable identity, not a shared service account, not a developer’s personal API key. CC6.1 requires logical access control procedures that limit system entry to authorized individuals; for AI environments, this covers model repositories, training data and inference APIs.

Tool access must follow least-privilege. CC6.3 requires that agents operate with only the permissions required for each specific task. An agent that needs read access to a customer database should not have write access. An agent that calls one API should not have credentials for ten. Permissions must be reviewed regularly, and elevated rights should be time-bound and logged.

LLM API providers are sub-processors. CC6.6 requires documenting sub-processor security requirements, reviewing their SOC 2 reports annually and maintaining contractual security obligations. If your agents call Anthropic, OpenAI or any other model provider, that relationship needs to be in your vendor risk management program.

Prompt and completion logging is now a CC6.1 and CC7.2 expectation. Every prompt sent to a model and every response received (particularly when prompts contain PII or when outputs may contain sensitive information) must be logged with timestamps, user identity and model version.

Roval implementation: The LLM request monitor captures every prompt through a transparent proxy with under 1ms of overhead. Every request is logged with full text, model, token counts, user identity and timestamp. Policy rules evaluate violations within 30 seconds. The agent registry assigns unique identities and tracks tool access permissions per agent.

Availability (A series)#

Availability is included in 75.3% of SOC 2 reports and is critical for any agent that supports business operations or customer-facing services.

What your auditor will ask:

What happens when an agent fails? Is there a fallback?
Can you immediately halt an agent that’s behaving abnormally?
What are your uptime commitments for agent-dependent services?
How do you handle capacity planning for variable token consumption?

The controls you need:

Agents must be designed with fail-safe or fail-open behavior. If the agent can’t reach its LLM provider, what happens? If the monitoring layer goes down, does the agent stop or continue unmonitored? The architecture must define failure modes explicitly.

A kill switch is essential. Article 14 of the EU AI Act requires it and SOC 2 availability criteria (A1.3) demand recovery procedures that include the ability to halt problematic systems. The kill switch must be tested quarterly: trigger, access revocation, state preservation, notification chain.

Circuit breakers prevent cascading failures. When an agent exceeds a violation threshold or error rate, the circuit breaker trips automatically, blocking further actions until an administrator reviews and resets. This is availability governance at the agent level.

Roval implementation: The Observer’s circuit breaker auto-stops agents exceeding violation thresholds. The LLM proxy is fail-open by design: if monitoring goes down, agents continue operating without telemetry being dropped and developer sessions never break. Kill switch capability is built into the lifecycle management layer.

Processing integrity (PI series)#

This is where AI agents create the most novel challenges for SOC 2. Processing Integrity (PI1) requires that system processing is “complete, valid, accurate, timely and authorized.” For a deterministic system, these terms have clear definitions. For an AI agent, they require reinterpretation.

What your auditor will ask:

How do you define “accurate” for a non-deterministic system?
How do you detect hallucinations?
How do you monitor for behavioral drift?
What guardrails prevent the agent from taking actions outside its defined scope?
How do you validate that outputs meet quality thresholds?

The controls you need:

Define accuracy in probabilistic terms. For classification tasks, this might mean “at least 95% accuracy on validation data, measured weekly.” For generative tasks, “outputs are factually grounded in provided context at least 98% of the time.” The key is that the threshold is documented, measurable and monitored, not aspirational.

Document acceptable output boundaries. What outputs are never acceptable? An agent processing financial data should never fabricate transaction records. An agent communicating with customers should never disclose internal pricing formulas. These boundaries must be enforced through guardrails, not just documented in a policy.

Monitor for drift continuously. Agent drift detection should include tracking output distributions, flagging when behavior deviates from the certified baseline and triggering review when thresholds are exceeded. A point-in-time assessment is inadequate. The auditor will test whether your drift detection operated effectively throughout the observation period.

Roval implementation: Policy rules define prohibited content patterns, model allowlists and prompt size limits, evaluated automatically within 30 seconds of capture. The Observer builds behavioral baselines after 30+ tool calls and highlights deviations. Drift detection runs every 15 minutes, catching certification expiry, configuration changes and behavioral anomalies.

PI1 requires system processing to be “complete, valid, accurate, timely and authorized.” For deterministic software, these terms are straightforward. For AI agents with non-deterministic outputs, they require redefinition. Accuracy must be expressed as measurable probabilistic thresholds, monitored continuously, not just at audit time.

Confidentiality (C series)#

Confidentiality is now included in 64.4% of SOC 2 reports, nearly double the 34% from 2023. For AI agents, confidentiality concerns center on what data enters the agent’s context window and what leaves it.

What your auditor will ask:

Can agents access confidential data? Which ones, and with what controls?
Does PII appear in prompts sent to LLM APIs?
Can the agent’s outputs leak confidential information?
How do you classify data sensitivity for agent-accessible sources?
What are your retention and deletion policies for captured prompts?

The controls you need:

Data classification must extend to agent-accessible sources. Every data source an agent can query needs a sensitivity classification: public, internal, confidential, restricted. The agent’s risk tier should reflect the highest classification of data it can access.

PII in prompts requires specific handling. When agents send prompts containing customer names, account numbers or other PII to external LLM APIs, that data is leaving your environment. Controls must address whether PII is scrubbed before transmission, whether the LLM provider’s data processing agreement covers this use case and whether prompt logs are retained with appropriate access controls.

Output filtering prevents confidential data leakage. If an agent has access to confidential data, its outputs must be monitored for inadvertent disclosure: customer data appearing in summaries, internal pricing in customer-facing communications or proprietary information in external API calls.

Roval implementation: The agent registry’s risk classification includes a data sensitivity dimension (public / internal / confidential / restricted) that drives governance requirements. The LLM proxy captures full prompt text with configurable response capture (opt-in, PII-scrubbed). Retention is 90 days with export before deletion.

Privacy (P series)#

Privacy criteria apply when agents process personal information, which is most enterprise agents. The Privacy TSC aligns closely with GDPR requirements, making it particularly relevant for European enterprises.

What your auditor will ask:

Does the agent process personal data? Have you disclosed this to data subjects?
How do you handle data subject access requests for data processed by agents?
What data minimization principles apply to agent prompts?
How long is personal data retained in agent logs?

The controls you need:

Privacy notices must disclose AI processing. If an agent makes decisions about individuals (eligibility assessments, risk scoring, customer routing), the individuals must be informed that AI is involved in the process.

Data minimization applies to prompts. Agents should receive only the personal data necessary for their task, not the customer’s entire profile when only their account status is needed. Over-provisioning context is both a privacy risk and a cost issue.

Retention limits must cover agent logs. If your LLM request logs contain personal data (customer names in prompts, account numbers in context), those logs are subject to your data retention policy. The 90-day retention period common in logging infrastructure may or may not align with your privacy obligations.

Roval implementation: The compliance certification workflow supports GDPR as a built-in framework alongside SOC 2, with per-requirement evidence tracking. The agent registry’s regulatory exposure dimension flags agents subject to privacy obligations. Audit trail exports (CSV/JSON) support data subject access requests by filtering for specific individuals.

The 10 questions your auditor will ask#

Based on practitioner reporting and the evolving audit environment, here are the ten questions you should prepare for:

How many AI agents are operating in your environment, and where is the inventory (CC6.1)
How do you authenticate and authorize agent access to systems and data (CC6.1, CC6.3)
How do you manage API keys for LLM providers and what’s the rotation schedule (CC6.1, CC6.6)
How do you validate processing integrity for a system with non-deterministic outputs (PI1.1)
How do you detect and respond to model drift or behavioral changes (CC8.1, PI1.3)
Can you produce an audit trail showing what an agent did on a specific date (CC7.1, CC7.2)
What’s your incident response plan for an agent-specific failure (CC7.3, CC7.4)
How do you handle PII in prompts sent to external LLM providers (C1.1, P1.1)
Which LLM providers are sub-processors and where are their SOC 2 reports (CC6.6, CC9.2)
Can you immediately halt an agent that’s behaving abnormally (A1.3)

If you can answer all ten with documented evidence, you’re ahead of most organizations. If you can’t, the 25-control checklist below provides the roadmap.

AWS re:Invent 2025, Securing AI agents: identity and access control for autonomous systems | YouTube

The checklist maps all 25 controls (15 pre-deployment, 10 post-deployment) to specific Trust Service Criteria codes so your compliance team can integrate them directly into your existing control matrix.

25 controls mapped to Trust Service Criteria codes. A pre-audit worksheet for compliance teams managing AI agent deployments.

Continuous compliance vs. point-in-time: why Type II changes everything#

SOC 2 Type I tests whether controls are designed effectively at a single point in time. Type II tests whether they operated effectively throughout the observation period, typically 6 to 12 months. The difference matters enormously for AI agents.

SOC 2 Type II requires extended observation, with minimum three-month windows for first-time engagements. Auditors will sample evidence from across the entire period. If your drift detection was configured in month 1 but failed silently in month 4, the auditor will find the gap.

For AI agents, this means governance must be continuous, not periodic. Certification expiry dates force re-review on a defined cadence. Drift detection runs automatically and generates evidence as a byproduct. Audit logs accumulate continuously and are immutable. Access reviews happen quarterly, not annually.

The organizations that pass Type II audits with AI agents in scope are the ones that build continuous evidence generation into their agent governance infrastructure from day one, not the ones that scramble to reconstruct evidence in the weeks before fieldwork begins.

AI agents need to be treated as first-class identities in your security infrastructure. The same rigor you apply to human access (authentication, authorization, monitoring, least privilege) must extend to every autonomous agent operating in your environment.

Roval’s architecture is designed for exactly this. Certifications auto-expire by risk tier (90 days for Critical, 180 for High, 365 for Low). Drift detection runs every 15 minutes and creates timestamped alert records. Every state change (registration, classification, certification, configuration change, ownership transfer) is recorded in an immutable audit log. The complete trail exports as CSV or JSON, filtered by resource, actor, action or date range. When the auditor arrives, evidence isn’t prepared; it’s exported.

What 6 months of Type II evidence looks like for agents#

A Type II audit samples evidence from across the entire observation period. For a 6-month window with AI agents in scope, your auditor will expect monthly artifacts in at least four categories.

Monthly agent registry snapshot. A point-in-time export showing every agent, owner, risk tier and certification status. If a critical agent’s owner left in month 3 and ownership was reassigned in month 5, both dates must appear in the audit trail.

Sampled prompt and completion logs. Ten to twenty random samples per agent per month, with full text, model version, token counts and any policy violations triggered. The auditor will check that PII-handling rules were applied consistently across the sample.

Drift detection summary. A monthly report of every drift alert raised, what was flagged and what action was taken. Silent months are fine. Silent quarters are not, unless the drift detection system itself can be shown to have run continuously through the gap.

Quarterly access reviews. Documented evidence that someone reviewed each agent’s tool permissions, revoked unused access and reassigned any orphaned ownership. Most SOC 2 programs already do quarterly reviews for human users. Agents are now part of the same cadence.

Most organizations can pass a SOC 2 Type I with a strong policy document and a clean point-in-time configuration. Type II is different. The audit fails when the evidence shows the control existed but stopped operating in month 4 and nobody noticed until the auditor pulled the sample in month 6.

The 25-control checklist#

We’ve published a comprehensive SOC 2 Audit Readiness Checklist for AI Agents as a downloadable PDF. It covers 15 pre-deployment controls and 10 post-deployment controls, each mapped to specific Trust Service Criteria codes.

SOC 2 Type II requires minimum three-month observation windows. Auditors sample evidence from across the entire period. If your drift detection was configured in month 1 but failed silently in month 4, the auditor will find the gap. Continuous evidence generation must be built into agent governance infrastructure from day one.

Pre-deployment controls span three categories: agent identity and access (unique IDs, ownership, least-privilege tool access, API key management, role-based access), risk classification and governance (multi-dimensional classification, compliance framework mapping, human-in-the-loop thresholds, production gates) and testing and documentation (adversarial testing, processing integrity thresholds, technical documentation, incident response plans, data lineage, audit trail enablement).

Post-deployment controls cover continuous monitoring (behavioral observability, drift detection, LLM request capture, performance metrics) and compliance lifecycle (certification currency, audit log review cadence, sub-processor SOC 2 review, quarterly access reviews, kill switch testing and decommissioning procedures).

Each control includes the applicable SOC 2 criteria codes so your compliance team can map them directly to your existing control matrix.

Roval’s architecture is designed to generate SOC 2 audit evidence as a byproduct of agent governance. The agent registry satisfies CC6.1 (identity and access), the LLM monitor satisfies CC7.2 (monitoring) and C1.2 (confidentiality) and compliance certification with auto-expiry satisfies CC4.1 (assessment). Drift detection every 15 minutes satisfies CC8.1 (change management). When the auditor arrives, evidence isn’t prepared; it’s exported. See how Roval maps to all five Trust Service Criteria with a solutions overview for compliance teams.

Sources and further reading#

Source	URL
AICPA, 2017 Trust Services Criteria (Revised 2022)	https://www.aicpa-cima.com/resources/download/2017-trust-services-criteria-with-revised-points-of-focus-2022
AICPA, SOC 2 for Service Organizations	https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2
CBIZ, 2024 SOC Benchmark Study	https://www.cbiz.com/insights/article/the-evolution-of-soc-reporting-key-findings-from-the-2024-soc-benchmark-study-part-two
Konfirmity, What Changed in SOC 2 for 2026	https://www.konfirmity.com/blog/soc-2-what-changed-in-2026
BeyondScale, SOC 2 for AI Systems: What Your Auditor Will Ask	https://beyondscale.tech/blog/soc2-compliance-ai-systems
Blaxel, SOC 2 Compliance for AI Agents in 2026	https://blaxel.ai/blog/soc-2-compliance-ai-guide
LetsAskClaire, SOC 2 Type II for AI Systems	https://www.letsaskclaire.com/security/soc2-type2-ai
Teleport, AI Agents and SOC 2	https://goteleport.com/blog/ai-agents-soc-2/
The Mavericks Co, SOC 2 AI Compliance News 2026	https://themavericksco.com/soc2/soc-2-ai-compliance-news-security-audit-trends/
Fieldguide, Four Steps to Year-Round SOC 2 Compliance	https://www.fieldguide.io/resource-articles/ai-continuous-soc-2-compliance
Konfirmity, SOC 2 Least Privilege	https://www.konfirmity.com/blog/soc-2-least-privilege-for-soc-2
DreamFactory, Enterprise AI Data Governance Statistics	https://www.dreamfactory.com/hub/enterprise-ai-data-governance-statistics
Kiteworks, 2026 Data Security Forecast	https://www.kiteworks.com/cybersecurity-risk-management/2026-data-security-forecast-ai-governance-predictions/
Anyreach, SOC 2 Compliance in Agentic Systems	https://blog.anyreach.ai/enterprise-ai-security-how-soc2-compliance-protects-your-data-in-agentic-systems/
EU AI Act, Article 14 (Human Oversight)	https://artificialintelligenceact.eu/article/14/
Roval, The AI Agent Governance Framework (8 Pillars)	/research/blog/ai-agent-governance-framework-8-pillars
Roval, EU AI Act and Your AI Agents	/research/blog/eu-ai-act-ai-agents-2026

Your next SOC 2 audit will include AI#

What SOC 2 says about AI agents now#

Why SOC 2 breaks for agents#

The five trust service criteria applied to AI agents#

Security (Common Criteria, mandatory)#

Availability (A series)#

Processing integrity (PI series)#

Confidentiality (C series)#

Privacy (P series)#

The 10 questions your auditor will ask#

Continuous compliance vs. point-in-time: why Type II changes everything#

What 6 months of Type II evidence looks like for agents#

The 25-control checklist#

Sources and further reading#

More in compliance

SR 26-2 lands: agentic AI was carved out. Here's what banks running agents should do today

AI agent governance in insurance: underwriting, claims and the regulatory reckoning

ISO 42001 compliance for AI agents: controls, certification and the gap most teams miss