---
title: "Guardian agents: when AI governs AI"
date: 2026-04-19
author: david
excerpt: "Humans can't watch every agent, every action, every second. Guardian agents (dedicated policy engines that monitor, audit and enforce governance on other agents at runtime) resolve the tension between machine-speed execution and meaningful oversight."
category: governance
tags: [governance, guardian-agents, runtime-enforcement, containment, eu-ai-act, compliance]
draft: false
tldr: "Guardian agents are deterministic policy engines (not LLMs) that intercept every agent action before execution, evaluate it against defined policies and block violations in real-time. They perform five functions: runtime policy enforcement, behavioral conformance monitoring, drift detection, graduated containment and audit trail generation. The recursive trust problem ('who watches the watchers?') is solvable because guardian agents evaluate pre-defined rules, not generate reasoning."
seo:
  title: "Guardian agents: when AI governs AI | Roval"
  description: "Guardian agents are deterministic policy engines that monitor, audit and enforce governance on other AI agents at runtime. Learn the architecture, containment model and deployment patterns."
faqs:
  - question: "What is a guardian agent?"
    answer: "A guardian agent is a dedicated system (typically a deterministic policy engine, not an LLM) that monitors, audits and enforces governance policies on other AI agents at runtime. It intercepts agent actions before they execute, evaluates them against policy and blocks violations in real-time."
  - question: "How is a guardian agent different from a guardrail?"
    answer: "Guardrails are embedded inside the agent; they're part of the agent's code. A guardian agent is external, operating at the infrastructure layer outside the agent's authority. This matters because an agent with code execution capabilities can theoretically bypass its own guardrails; it cannot bypass an external enforcement layer that intercepts its actions before they reach the target system."
  - question: "Does using AI to govern AI satisfy EU AI Act requirements?"
    answer: "No. Article 14 requires oversight 'by natural persons.' Guardian agents supplement human oversight (they handle the volume of monitoring that humans can't perform at machine speed) but they don't replace the human judgment required by the regulation. The legal architecture is: guardian agents monitor and enforce at runtime, humans review guardian performance and calibrate policies periodically."
  - question: "What happens if the guardian agent fails?"
    answer: "For Tier 1-2 agents: fail-open. Agent actions proceed and are logged for post-hoc review. For Tier 3-4 agents: fail-closed. Agent actions are blocked until governance is restored. This ensures that the highest-risk agents never operate ungoverned, even during guardian outages."
  - question: "How does this connect to [risk classification](/research/blog/ai-agent-risk-classification)?"
    answer: "The agent's risk tier (from Pillar 2) determines guardian intensity: deployment pattern (sidecar vs. gateway vs. mesh), default containment level, auto-escalation thresholds and fail-open vs. fail-closed behavior. Tier 1 agents get lightweight monitoring; Tier 4 agents get dedicated sidecar guardians with redundant instances."
  - question: "Where does Roval fit?"
    answer: "Roval's Observer is the guardian agent layer: it captures real-time agent activity (tool invocations, policy violations, behavioral anomalies), evaluates against configurable policy rules and triggers the circuit breaker when violation thresholds are exceeded. The behavioral baseline builds automatically after 30+ tool calls, anomalies are highlighted against the baseline and the entire system operates with a fail-open design: if the Observer is unavailable, agent sessions continue uninterrupted."
---

## The scale problem

In March 2026, an in-house AI agent at Meta (deployed to help engineers analyze technical questions) [autonomously posted a response on an internal forum](https://kla.digital/blog/why-static-ai-governance-breaks-down-for-agents) without the employee's approval. The flawed technical guidance triggered a chain reaction that exposed sensitive company and user data to unauthorized engineers for over two hours. Meta rated the incident Sev-1, its second-highest severity level. Separately, Meta's head of AI safety reported that an agent deleted her entire email inbox despite explicit "STOP" commands, a failure attributed to context window compaction dropping the safety instructions mid-session.

These aren't hypothetical scenarios. They're production incidents at one of the world's most sophisticated AI organizations. And they illustrate a problem that scales with every agent you deploy: humans can't watch every agent, every action, every second.

:::fact{title="The oversight gap in numbers"}
Saviynt's 2026 CISO AI Risk Report found that 47% of CISOs observed agents exhibiting unintended or unauthorized behavior, and only 5% felt confident they could contain a compromised agent. SailPoint's 2025 survey revealed that 39% of respondents reported agents accessing unauthorized systems and 33% accessing inappropriate data. McKinsey estimates that 80% of organizations have encountered risky behavior from AI agents.
:::

Meanwhile, [Gartner predicts](https://machinelearningmastery.com/7-agentic-ai-trends-to-watch-in-2026/) 40% of enterprise applications will embed AI agents by the end of 2026. The agent population is growing faster than the human capacity to oversee it. And as the [adaptive human oversight](/research/blog/adaptive-human-oversight) article in this series established, even when humans *are* watching, they provide correct oversight roughly half the time.

The emerging answer: dedicated AI systems whose sole purpose is to monitor, audit and enforce governance on other agents at runtime. The industry calls them guardian agents.

:::cite{name="Rick Caccia" title="CEO & Co-Founder, WitnessAI" avatar="/images/experts/rick-caccia.jpg" linkedin="https://linkedin.com/in/rcaccia"}
AI workflows are maturing and starting to cross corporate and cloud LLMs and agents. The alternative to a unified runtime security layer is trying to stitch together secure workflows using network proxies, firewalls, DLP products and XDR agents. In short, the alternative is a complex mess.
:::

## What guardian agents do

A guardian agent is a dedicated system (typically a deterministic policy engine, not an LLM) that operates alongside production agents and enforces governance policies in real-time. IBM introduced the concept as "governance agents" in their [AI agent governance thought leadership](https://www.ibm.com/think/insights/ai-agent-governance). The World Economic Forum's [December 2025 progressive governance paper](https://www.weforum.org/stories/2025/12/ai-agents-onboarding-governance/) describes them as "auditor agents" that "help monitor, validate and regulate agent ecosystems at scale."

Roval's [LLM monitoring](/platform/llm-monitoring) layer captures this runtime telemetry for every agent action, providing the observability foundation that guardian agents depend on.

Guardian agents perform five distinct functions.

**Runtime policy enforcement.** The guardian intercepts agent-to-tool interactions before execution, evaluates them against defined policies and blocks violations in real-time. If an agent attempts to access a restricted database, invoke a tool outside its authorized set or send data to an unapproved external endpoint, the guardian blocks the action before it happens, not after. This is the function that would have prevented the Meta incident: a policy rule checking "does this agent have authorization to post to this forum?" would have intercepted the action before it executed.

**Behavioral conformance monitoring.** The guardian compares agent behavior against expected behavioral patterns, specifically what [MI9](https://arxiv.org/html/2508.03858v1) (the runtime governance framework developed by Barclays-affiliated researchers) calls "FSM-based conformance engines." If an agent typically makes 10-20 tool calls per session but suddenly executes 200 or if it begins accessing data sources outside its normal pattern, the guardian flags the deviation.

**[Drift detection](/research/blog/agent-drift-continuous-compliance).** Over time, agents can gradually expand their scope beyond their original intent, what governance researchers call "goal-conditioned behavioral drift." A customer support agent that starts by answering product questions may, through accumulated context and tool access, begin offering financial advice it was never authorized to give. The guardian monitors for this gradual expansion and alerts when the agent's behavioral trajectory diverges from its defined scope.

**Graduated containment.** When a violation or anomaly is detected, the guardian executes a proportional response rather than a binary on/off. The response spectrum runs from alerting a human reviewer at the lowest level, through throttling the agent's action rate, restricting specific tool access, isolating the agent from other systems, to fully halting execution at the highest level. The appropriate response depends on the severity of the violation and the agent's risk tier.

**Audit trail generation.** Every governance decision (every policy evaluation, every intervention, every containment action) is recorded in a tamper-evident log. This log serves dual purposes: operational (understanding what happened and why) and compliance (producing the evidence that EU AI Act Article 14 and [SOC 2 auditors](/research/blog/soc-2-ai-agents) require).

## The architecture: external, not embedded

The most important architectural decision in guardian agent design is *where* the governance logic lives. A [March 2026 paper on runtime governance](https://arxiv.org/html/2603.16586) published on ArXiv states the principle clearly: "Agent-level guardrails (output filters, content classifiers, self-critique steps) operate under the agent's own authority. They are part of the agent's code, executed within the agent's process. For agents with code execution capabilities, this is a fundamental limitation: an agent that can write code can, in principle, write code that bypasses its own guardrails. They are not governance; they are self-regulation."

Guardian agents must be external, operating outside the governed agent's authority, intercepting interactions at the infrastructure layer, not the application layer.

:::cite{name="Kevin Kiley" title="CEO, Airia" avatar="/images/experts/kevin-kiley.jpg" linkedin="https://linkedin.com/in/kkiley"}
Enterprises are eager to embrace AI agents, but they're hitting the same wall: how do you maintain control and compliance without slowing down innovation? Agent Constraints solves this by moving policy enforcement from the agent layer to the infrastructure layer, giving organizations the best of both worlds: security and agility.
:::

The architecture pattern:

```
Agent → Guardian Agent (policy evaluation) → Tool / API
                    ↓
           Audit Log + Alert System
                    ↓
           Human Dashboard (exception review)
```

Every agent action passes through the guardian before reaching its target. The guardian evaluates the action against policy, logs the decision and either permits, modifies or blocks the action, all before the target system is touched.

<figure><a href="https://github.com/microsoft/agent-governance-toolkit" target="_blank" rel="noopener"><img src="/images/blog/microsoft-agent-governance-toolkit.png" alt="Microsoft Agent Governance Toolkit GitHub repository: policy enforcement, zero-trust identity and execution sandboxing for autonomous AI agents" loading="lazy" decoding="async" /></a><figcaption>Microsoft's open-source Agent Governance Toolkit covers 10/10 OWASP Agentic Top 10 controls | <a href="https://github.com/microsoft/agent-governance-toolkit" target="_blank" rel="noopener">Source</a></figcaption></figure>

[Microsoft's open-source Agent Governance Toolkit](https://github.com/microsoft/agent-governance-toolkit) demonstrates this approach: runtime middleware that intercepts agent-to-tool interactions with less than 0.1ms overhead per action. [Airia's Agent Constraints](https://airia.com/agent-constraints-a-technical-deep-dive-into-policy-based-ai-agent-governance/) takes a similar approach, positioning a policy engine between agents and target resources with a three-phase rollout: monitor mode (log everything, enforce nothing), soft enforcement (block critical violations, monitor others) and full enforcement with automated remediation.

### Three deployment patterns

**Sidecar pattern.** One guardian instance per agent. The guardian runs as a co-process alongside the governed agent and intercepts all of its tool calls. This provides the strongest isolation (each agent has its own policy context and behavioral baseline) but creates the highest overhead in terms of resource consumption. Best for Tier 4 (Critical) agents where maximum governance intensity is warranted.

**Gateway pattern.** One guardian instance per team or department. All agents in the group route their actions through a shared gateway guardian that evaluates policies centrally. This balances governance coverage with operational efficiency. The gateway maintains per-agent behavioral baselines while sharing infrastructure. Best for Tier 2-3 agents where team-level governance is sufficient.

**Mesh pattern.** An organization-wide guardian mesh where policy evaluation is distributed across multiple nodes, each specialized for different policy domains (data access, tool invocation, inter-agent communication). The mesh coordinates through a shared policy store and a central audit log. This provides the broadest coverage with the most flexibility but requires the most sophisticated infrastructure. Best for organizations with hundreds of agents across multiple frameworks.

### Performance constraints

Guardian agents must satisfy a hard constraint: they cannot meaningfully slow down the governed agent. [Microsoft's toolkit](https://github.com/microsoft/agent-governance-toolkit) targets less than 0.1ms per action, roughly 10,000 times faster than an LLM API call. This is achievable because guardian agents are deterministic policy engines evaluating pre-compiled rule sets, not generative models reasoning about each action.

For Tier 1-2 agents, the guardian operates in fail-open mode: if the guardian itself is unavailable, agent actions proceed and are logged for post-hoc review. For Tier 3-4 agents, the guardian operates in fail-closed mode: if the guardian is unavailable, agent actions are blocked until governance is restored.

## Who watches the watchers?

The [World Economic Forum](https://www.weforum.org/stories/2025/12/ai-agents-onboarding-governance/) raises the recursive trust problem explicitly: "New approaches such as using auditor agents can help monitor, validate and regulate agent ecosystems at scale. They also create new vulnerabilities, however, including the risk of depending on agents to monitor other agents."

This is the right concern. And it has five practical solutions.

**Guardian agents are deterministic, not generative.** This is the most important design principle. A guardian agent is a rule-based policy engine: it evaluates pre-defined conditions and produces deterministic outputs rather than using an LLM to "reason" about whether an action is acceptable. It checks whether the action matches a prohibited pattern, whether the agent has authorization for this tool and whether the data classification is within the agent's access scope.

Deterministic evaluation breaks the recursion: you don't need AI to watch the AI watching the AI, because the watcher isn't AI in the generative sense; it's a policy engine.

**Separation of concerns.** Guardian agents enforce policies; they don't create them. Policy creation remains a human function. The guardian receives policy definitions (written in code, not natural language; see [policy-as-code](/research/blog/policy-as-code-ai-agents)), compiles them into evaluation rules and applies them at runtime. The guardian has no authority to modify its own policies. This separation ensures that the governance logic is always traceable to a human decision.

**Immutable audit trails.** Guardian agent actions are logged to an append-only store that neither the guardian nor the governed agent can modify. The audit log captures: what action was proposed, what policy was evaluated, what decision was made and what evidence supports the decision. This log is the auditable record that proves the guardian is functioning correctly and it's the first place investigators look when something goes wrong.

**Human meta-oversight.** Humans don't watch every agent action, but they do watch the guardian's performance. This means reviewing: false positive rates (how often the guardian blocks legitimate actions), false negative rates (how often violations slip through), policy effectiveness (which policies are triggered most often and whether they're calibrated correctly) and containment decisions (whether graduated responses are proportional). This periodic review (weekly for Tier 3-4 agents, monthly for Tier 1-2) is the human layer that satisfies EU AI Act Article 14's requirement for oversight "by natural persons."

**Redundant guardians for critical agents.** For Tier 4 agents, deploy two independent guardian instances using different policy engines. Both evaluate every action. For sensitive operations (data access, external communications, financial transactions), require agreement from both guardians before the action proceeds. Disagreements are routed to a human reviewer. This defense-in-depth approach prevents a single point of failure in the governance layer.

## Graduated containment: from alert to killswitch

When a guardian agent detects a policy violation or behavioral anomaly, the response should be proportional to the severity. A five-level containment spectrum:

**Level 1: Alert.** The guardian logs the event and notifies the assigned human reviewer. The agent continues operating. Appropriate for: informational anomalies, low-severity policy near-misses, behavioral pattern changes that may or may not indicate drift.

:::fact{title="EU AI Act Article 14 and guardian agents"}
[Article 14(1)](https://artificialintelligenceact.eu/article/14/) requires that high-risk AI systems "can be effectively overseen by natural persons during the period in which they are in use." Guardian agents don't satisfy this requirement on their own; they supplement human oversight, not replace it. The legal architecture is: guardian agents handle the volume (monitoring every action at machine speed), while humans handle the judgment (reviewing guardian performance, calibrating policies, investigating anomalies). This division maps directly to Article 14(3)'s requirement that oversight measures be "commensurate with the risks, level of autonomy and context of use."
:::

**Level 2: Throttle.** The guardian reduces the agent's action rate: fewer tool calls per minute, longer delays between actions. The agent continues operating but at reduced capacity, giving human reviewers time to assess the situation. Appropriate for: moderate anomalies, sustained above-baseline activity, actions that are individually acceptable but collectively concerning.

**Level 3: Restrict.** The guardian revokes specific tool access or data permissions. The agent continues operating but with a narrower scope. Appropriate for: policy violations involving specific tools or data sources, attempted access to restricted resources, actions outside the agent's authorized scope.

**Level 4: Isolate.** The guardian disconnects the agent from other agents and external systems while allowing it to continue internal processing. This prevents cascading failures in [multi-agent environments](/research/blog/multi-agent-governance). Appropriate for: suspected compromise, inter-agent communication anomalies, actions that could propagate to dependent agents.

**Level 5: Halt.** The guardian stops the agent entirely. All in-flight actions are cancelled or rolled back where possible. A human must explicitly restart the agent after investigation. Appropriate for: critical policy violations, confirmed security incidents, any safety-critical failure mode.

The mapping to risk tiers provides default escalation paths:

| Risk Tier | Default Starting Level | Auto-Escalation Trigger |
|-----------|----------------------|------------------------|
| Tier 1 (Low) | Level 1 (Alert) | 5+ alerts in 24 hours → Level 2 |
| Tier 2 (Medium) | Level 1 (Alert) | 3+ alerts in 1 hour → Level 2; any data access violation → Level 3 |
| Tier 3 (High) | Level 2 (Throttle) | Any policy violation → Level 3; 2+ violations in 1 hour → Level 4 |
| Tier 4 (Critical) | Level 3 (Restrict) | Any policy violation → Level 4; any safety-critical violation → Level 5 |

Recovery after containment follows a defined protocol: investigate the trigger, remediate the root cause, validate the fix in a sandboxed environment and then restore the agent to its previous containment level. For Level 5 halts, restoration requires explicit human authorization; the guardian doesn't auto-restart halted agents.

<figure><div style="position:relative;padding-bottom:56.25%;height:0;overflow:hidden;border-radius:8px;border:1px solid var(--border)"><iframe src="https://www.youtube.com/embed/4pYzYmSdSH4" title="Andrew Ng: State of AI Agents | LangChain Interrupt 2025" style="position:absolute;top:0;left:0;width:100%;height:100%;border:0" allow="accelerometer;autoplay;clipboard-write;encrypted-media;gyroscope;picture-in-picture" allowfullscreen loading="lazy"></iframe></div><figcaption>Andrew Ng on practical AI agent development, runtime governance and the state of agentic systems | <a href="https://www.youtube.com/watch?v=4pYzYmSdSH4" target="_blank" rel="noopener">YouTube</a></figcaption></figure>

## Implementation roadmap

Deploying guardian agents follows a three-phase approach that mirrors how the agents themselves are progressively automated.

**Phase 1: Monitor mode (Weeks 1-4).** Deploy the guardian in observation mode. It evaluates every agent action against policy but never blocks anything. All evaluations are logged. This phase accomplishes three things: it validates that the policy rules are correctly configured (you'll find misconfigured rules that would have blocked legitimate actions), it establishes behavioral baselines for each agent and it gives the operations team experience with the guardian's dashboard and alert system before enforcement begins.

**Phase 2: Soft enforcement (Weeks 5-8).** Enable blocking for critical policies: data access violations, unauthorized tool invocations and actions that would affect external systems. Continue monitoring (but not blocking) for medium and low-severity policies. Review every block event to calibrate thresholds. This phase reduces false positives while establishing enforcement credibility: agents and their operators learn that the guardian has teeth.

**Phase 3: Full enforcement (Week 9+).** Activate all policies with graduated containment. The guardian is now fully operational: monitoring, evaluating, blocking, containing and logging across the entire policy set. Human reviewers shift from reviewing every event to reviewing exceptions, dashboard trends and periodic performance reports.

### Build vs. buy

The guardian agent market is young but active. [Microsoft's Agent Governance Toolkit](https://github.com/microsoft/agent-governance-toolkit) is open-source, covers all 10 OWASP Agentic risks and works with LangChain, CrewAI, AutoGen, OpenAI and other frameworks. [Airia's Agent Constraints](https://airia.com/agent-constraints-a-technical-deep-dive-into-policy-based-ai-agent-governance/) provides a commercial infrastructure-layer policy engine. [WitnessAI](https://witness.ai/blog/agentic-ai-governance-framework/) provides network-level visibility and control. For organizations with specific requirements, custom policy engines built on top of Open Policy Agent (OPA) or Cedar provide maximum flexibility.

The build-vs-buy decision depends on agent volume: under 20 agents, a custom policy engine is manageable; 20-100 agents, evaluate commercial options; over 100, you likely need a platform that combines guardian functionality with [registry](/platform/agent-registry), classification and [compliance certification](/platform/compliance) in a single system.

## Governance at machine speed

The fundamental tension in AI agent governance is speed versus oversight. Agents operate at machine speed, hundreds of actions per minute. Humans operate at human speed, one decision at a time, with attention that fades after long error-free periods. Guardian agents resolve this tension by placing a deterministic, auditable, tireless enforcement layer between every agent and every action it takes.

The architecture is straightforward: external enforcement, deterministic policy evaluation, graduated containment, immutable audit logs and human meta-oversight. The recursive trust problem is real but solvable: guardian agents aren't AI in the generative sense; they're policy engines that evaluate pre-defined rules at machine speed and log every decision for human review.

Start with your Tier 3 and Tier 4 agents. Deploy in monitor mode for a month. Establish baselines. Calibrate policies. Then turn on enforcement. By the time you're done, you'll have a governance layer that operates at the same speed as the agents it governs and an audit trail that proves it's working.

:::cta{title="See Roval in action" description="Book a 15-minute walkthrough of the agent registry, compliance certification and LLM monitoring." cta="Book a demo" href="/demo"}
:::

## Sources and further reading

| Source | URL |
|--------|-----|
| MI9 — Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems | https://arxiv.org/html/2508.03858v1 |
| "Runtime Governance for AI Agents: Policies on Paths" (ArXiv, Mar 2026) | https://arxiv.org/html/2603.16586 |
| WEF, "AI Agents in Action: Foundations for Evaluation and Governance" (Nov 2025) | https://www.weforum.org/publications/ai-agents-in-action-foundations-for-evaluation-and-governance/ |
| WEF, Progressive Governance for AI Agents (Dec 2025) | https://www.weforum.org/stories/2025/12/ai-agents-onboarding-governance/ |
| Microsoft Agent Governance Toolkit (GitHub) | https://github.com/microsoft/agent-governance-toolkit |
| IBM, "AI Agent Governance: Big Challenges, Big Opportunities" | https://www.ibm.com/think/insights/ai-agent-governance |
| KLA Digital, "Why Static AI Governance Breaks Down for Agents" (Mar 2026) | https://kla.digital/blog/why-static-ai-governance-breaks-down-for-agents |
| Airia, "Agent Constraints: Policy-Based AI Agent Governance" | https://airia.com/agent-constraints-a-technical-deep-dive-into-policy-based-ai-agent-governance/ |
| WitnessAI, "Agentic AI Governance Framework" | https://witness.ai/blog/agentic-ai-governance-framework/ |
| EU AI Act, Article 14 (Human Oversight) | https://artificialintelligenceact.eu/article/14/ |
| Saviynt CISO AI Risk Report 2026 | Referenced via KLA Digital |
| SailPoint AI Agent Identity Security 2025 | Referenced via KLA Digital |
| Palo Alto Networks, "A Complete Guide to Agentic AI Governance" | https://www.paloaltonetworks.com/cyberpedia/what-is-agentic-ai-governance |
