Policy-as-code for AI agents: from documents to enforcement

Your governance policies are not governing anything#

Somewhere in your organization there is a document. It might be a PDF in Confluence, a page in SharePoint or a slide deck that was last reviewed in Q3. It says things like “AI agents must not access production databases without approval” and “all agent actions involving customer PII require human review.” It was written by a committee, signed by a VP and is technically your AI governance policy.

Here is the problem: your AI agents have never read it.

An AI agent executing tool calls at the rate of dozens per minute does not pause to consult a governance document. It doesn’t check a wiki. It doesn’t ask whether the action it’s about to take is consistent with what the compliance team agreed to six months ago. It acts. And by the time a human reviews what happened (if anyone reviews at all), the action has already been taken, the data has already been accessed and the email has already been sent.

This is the enforcement gap at the heart of enterprise AI governance. Organizations invest heavily in defining policies but almost nothing in making those policies executable. The result is governance theatre: documentation that satisfies a committee but constrains nothing in production.

Policy-as-code closes that gap. It takes the behavioral boundaries that your governance team defines and expresses them as machine-readable, machine-enforceable rules that are evaluated in real-time, at the moment an agent acts.

Forrester’s 2026 survey of 500 enterprises deploying AI agents found that 71% lack a formal governance framework for autonomous agents, even as 64% plan to increase agent autonomy within 12 months. Financial institutions using policy-as-code have achieved reductions of 90% or more in compliance violations stemming from human error, particularly in high-volume transaction monitoring where manual review is impractical.

What policy-as-code means in practice#

Policy-as-code is the practice of expressing governance rules (access controls, behavioral boundaries, compliance requirements, authorization thresholds) in a formal, machine-readable language that can be version-controlled, tested and enforced automatically.

The concept isn’t new. Infrastructure teams have used policy-as-code for years to govern cloud resources. Open Policy Agent (OPA) with its Rego language is the industry standard for Kubernetes admission control, API authorization and infrastructure compliance. AWS’s Cedar language provides formally verified policy evaluation for access control. Google’s Common Expression Language (CEL) handles lightweight, embeddable rule evaluation.

What is new is applying these techniques to AI agents: systems whose actions are non-deterministic, context-dependent and potentially unbounded.

The distinction from traditional guardrails matters. Model-level safety features filter content during inference; they prevent a language model from generating harmful text. Policy-as-code operates at the action layer, where agents interact with your systems, databases and infrastructure. A model can be trained not to generate harmful text, but that doesn’t prevent it from executing a destructive database query when asked to “clean up old data.” Policy-as-code catches the action, not the thought.

Responsible governance requires strategies for embedding ethical constraints in AI design, the role of explainability in agent decision-making and how multi-agent coordination informs regulatory compliance. The real challenge is embedding ethical and operational constraints directly into system design, not bolting them on after deployment.

Virginia Dignum made this argument in her AAMAS 2025 keynote, and it frames the case for policy-as-code precisely. Policy-as-code is the operational mechanism that makes this embedding concrete.

The three enforcement layers#

Effective policy enforcement for AI agents requires governance at three distinct points in the agent lifecycle. Each layer catches different types of violations, and no single layer is sufficient on its own.

Layer 1: Build-time enforcement (shift-left governance)#

Build-time enforcement applies before an agent ever touches production data. It’s the “shift-left” equivalent for agent governance, catching policy violations during development and testing, not after deployment.

What build-time enforcement validates:

  • Does the agent have an assigned owner and accountability chain?
  • Has it been risk-classified across data sensitivity, decision authority and blast radius?
  • Are its tool permissions scoped to least-privilege?
  • Has it been tested against adversarial inputs that probe its behavioral boundaries?
  • Is its deployment configuration consistent with the approved policy for its risk tier?
  • Has a pre-deployment impact assessment been completed?

Singapore’s Infocomm Media Development Authority (IMDA) published the world’s first governance framework for agentic AI in January 2026, presented at the World Economic Forum in Davos. The framework explicitly requires organizations to “assess and bound risks upfront prior to deployment,” including narrowing the agent’s “action-space” by limiting access to tools and external systems before deployment, not after an incident. The WEF’s own “AI Agents in Action” framework (November 2025) echoes this, recommending pre-deployment testing to validate “baseline safety and reliability, execution accuracy, policy adherence and tool use.”

Build-time enforcement is where policy-as-code integrates with your CI/CD pipeline. Policy rules are evaluated as part of the agent’s deployment process, just as infrastructure-as-code is validated before a Terraform plan is applied.

Singapore’s IMDA framework defines four dimensions for pre-deployment risk assessment of agentic AI: (1) assess and bound risks upfront: narrow the agent’s action-space before deployment, (2) increase human accountability: define who is responsible for the agent’s actions, (3) implement technical controls: testing, access controls and monitoring mechanisms, (4) enable end-user responsibility: provide transparency about the agent’s capabilities and limits. Although voluntary, the framework is already shaping supervisory expectations in APAC and is being studied by European regulators as a model for agent-specific governance.

Layer 2: Deploy-time enforcement (the certification gate)#

Deploy-time enforcement is the gate between staging and production. It’s the moment where an agent’s readiness is validated against its risk tier before it’s allowed to operate in the live environment.

For low-risk agents (a knowledge base Q&A bot touching only public data), the gate might be a lightweight automated check. For critical agents (a trading system, a clinical decision support tool, an agent with access to customer financial records), the gate requires explicit certification: all compliance requirements mapped, all evidence uploaded, all stakeholder sign-offs recorded.

The principle is borrowed from infrastructure: you don’t promote a container to production without passing a security scan. You shouldn’t promote a high-risk AI agent to production without passing a governance certification.

This is where policy-as-code intersects with continuous certification. The certification gate is not a one-time check; it’s re-evaluated whenever the agent’s configuration changes, whenever its dependencies update and whenever its existing certification expires. (For more on how continuous certification works, see our cornerstone article on the 8 pillars of AI agent governance, specifically Pillar 4.)

Layer 3: Runtime enforcement (real-time policy evaluation)#

Runtime enforcement is where policy-as-code delivers its most visible value. Every action an agent attempts (every tool call, every API request, every database query) is intercepted, evaluated against the active policy and either permitted or denied before execution.

The architecture follows a well-established pattern from API gateway security:

Policy Decision Point (PDP): The engine that evaluates an agent’s requested action against the active policy rules. It receives the action context (what the agent wants to do, what data it’s accessing, what user initiated it) and returns a decision: PERMIT or DENY.

Policy Enforcement Point (PEP): The gateway that intercepts the agent’s action, queries the PDP and either allows the action to proceed or blocks it. The PEP sits at the chokepoint between the agent and the tools it uses.

AWS shipped exactly this architecture in December 2025 with Policy in AgentCore. Their implementation uses Cedar to evaluate every tool call at the gateway layer, intercepting all agent traffic, evaluating it against deterministic Cedar policies and returning a binary PERMIT or DENY before any tool executes. The latency budget is single-digit milliseconds. The decision is provable and auditable. Policy in Amazon Bedrock AgentCore became generally available on March 3, 2026.

AWS re:Invent 2025, Architecting scalable and secure agentic AI with Bedrock AgentCore | YouTube

Every policy evaluation is logged, creating the audit trail that the EU AI Act Article 12 requires for high-risk AI systems.

Choosing a policy language#

Not all policy languages are created equal. Each was designed for different evaluation characteristics, and the right choice depends on what you’re enforcing.

Cedar Policy Language playground, AWS's open-source, formally verified language for fine-grained authorization
The Cedar playground lets you test policy rules interactively before deploying them to production | Source

Cedar (AWS)#

Cedar is a declarative, formally verified policy language designed for access control. AWS open-sourced it in 2023 and added it to AgentCore in 2025 for AI agent governance.

Cedar’s defining property is formal verification. AWS mathematically proves properties about Cedar’s authorization engine using automated reasoning, giving high assurance of correctness and security. Policies are deterministic: the same input always produces the same decision. They’re version-controllable (they live in Git), testable (they run through CI/CD) and fast (millisecond evaluation).

Best for: Tool invocation permissions, financial thresholds, role-based agent access, data access controls. Anywhere you need provable, deterministic enforcement.

Rego, the open policy agent language#

Rego is the policy language for Open Policy Agent (OPA), the most widely adopted general-purpose policy engine. OPA decouples policy decisions from application logic, enabling fine-grained access control, data filtering and compliance checks.

Rego’s strength is adoption breadth. It’s already deployed across Kubernetes admission control, API authorization, infrastructure compliance and now AI agent governance. OPA has native integration with Envoy’s External Authorization API and has recently added AI API examples for controlling access to generative AI endpoints.

Best for: Data access policies, model allowlists, deployment gates, Kubernetes-native environments. Anywhere you already have OPA in your stack.

Open Policy Agent (OPA), the most widely adopted general-purpose policy engine
Open Policy Agent has become the industry standard for decoupling policy decisions from application logic | Source

CEL, Google’s common expression language#

CEL is a lightweight expression language designed for embedding in larger systems. It’s deterministic, sandboxed and evaluates in microseconds, making it ideal for high-volume, low-latency policy checks.

Best for: Blocked command patterns, file path restrictions, argument validation, pattern matching on tool call parameters. Anywhere you need fast, simple rule evaluation without the overhead of a full policy engine.

Natural language (AI-evaluated policies)#

Natural language policies use an LLM to evaluate whether an agent’s intended action is consistent with a governance rule expressed in plain English. “Don’t allow any operation that modifies production data” is evaluated by an AI model, not a deterministic engine.

Best for: Complex ethical boundaries, intent-based policies, edge cases where deterministic rules can’t capture the nuance. Use sparingly; AI-evaluated policies are non-deterministic and harder to audit.

The critical design insight: production systems need both deterministic and AI-evaluated policies. Deterministic for hard boundaries: “never execute financial transactions above 10,000 EUR without approval,” “never access the customer PII table outside of approved workflows.” AI-evaluated for contextual judgment: “is this agent’s behavior consistent with its intended purpose?” The deterministic layer handles the 95% of cases that are clear-cut. The AI layer handles the 5% that require interpretation.

LanguageTypeVerificationLatencyAgent governance use case
CedarDeclarative, formally verifiedMathematical proof~1msTool permissions, financial thresholds, RBAC
RegoDeclarative, general-purposeTest suites~1-5msData access, deployment gates, model allowlists
CELExpression languageSandbox isolation<1msCommand blocking, argument validation, pattern matching
Natural languageAI-evaluatedNot provable100ms+Intent boundaries, ethical constraints, edge cases

Mapping policy-as-code to the EU AI Act#

The EU AI Act is the most consequential regulatory driver for policy-as-code adoption in Europe. Several articles translate directly into enforcement requirements that policy-as-code implements.

EU AI Act requirementArticleHow policy-as-code enforces it
Risk management systemArt. 9Build-time: pre-deployment policy checks validate agent against risk tier requirements
Technical documentationArt. 11Build-time: policy-as-code is the documentation, version-controlled, auditable, machine-readable
Automatic event loggingArt. 12Runtime: every policy evaluation logged with action, decision, context, timestamp, actor
Transparency obligationsArt. 13Runtime: policy decisions are explainable; the rule that triggered a block is identified and logged
Human oversightArt. 14Runtime: escalation policies route high-risk actions to human review queues
Accuracy and robustnessArt. 15Deploy-time: adversarial testing against policy boundaries before production promotion
Post-market monitoringArt. 62Ongoing: policy evaluation logs feed continuous compliance monitoring

The ENISA Advisory Group’s 2025 opinion paper on cybersecurity for AI recommended that organizations adopt “Secure by Design, Secure by Default, Secure in Operations” principles for AI systems and specifically called for ENISA to “provide AI cybersecurity baselines, defining minimum security standards for AI systems, products and services.” Policy-as-code is the mechanism that makes “Secure by Default” operational: policies are enforced automatically unless explicitly overridden, not the other way around.

The EU’s General-Purpose AI Code of Practice, published July 2025, adds a further dimension. The Code’s Safety and Security chapter outlines practices for managing systemic risks from the most advanced models. For enterprises deploying agents built on these models, policy-as-code provides the control layer between the model provider’s safety measures and the enterprise’s own governance requirements.

Non-compliance with the EU AI Act carries penalties that scale with severity: up to 35 million EUR or 7% of global annual turnover for the most severe violations (prohibited AI practices) under Article 99, up to 15 million EUR or 3% of turnover for violations of other obligations and up to 7.5 million EUR or 1% of turnover for supplying incorrect information. For an enterprise with 1 billion EUR in annual revenue, the maximum exposure is 70 million EUR.

The implementation playbook covers the full path from audit-only observation to production enforcement in seven steps, with completion checklists, a policy language decision framework (Cedar vs Rego vs CEL vs natural language) and a complete EU AI Act Article-to-enforcement mapping.

A step-by-step guide from audit-only to production enforcement, with checklists for each step, a policy language comparison table and a complete EU AI Act Article-to-enforcement mapping.

What happens when a policy fails#

Policy enforcement introduces a new category of operational questions. What happens when the enforcement layer itself encounters an error? When a policy evaluation times out? When the policy engine crashes?

Fail-open vs fail-closed#

Fail-closed means that when the policy engine is unavailable, all agent actions are blocked. This is the safer default for critical agents. A trading agent that can’t verify its policy should not execute trades. The risk is operational: if the policy engine goes down, the agent stops working entirely.

Fail-open means that when the policy engine is unavailable, agent actions are allowed to proceed (with logging). This is appropriate for lower-risk agents where operational continuity matters more than perfect enforcement. A knowledge base Q&A bot shouldn’t stop answering questions because the policy engine is temporarily degraded. The risk is governance: during the outage window, actions are unvalidated.

The right default depends on risk tier. Critical and high-risk agents should fail-closed. Medium and low-risk agents can fail-open with enhanced logging. This mirrors Roval’s own architecture: the LLM proxy is designed to fail-open (developer sessions never break), while the production gate for high-risk agent certification is fail-closed (uncertified agents are blocked, period).

The OWASP Top 10 for Agentic Applications identifies tool misuse, goal hijacking and supply chain vulnerabilities as top-tier risks specific to agents. Securing autonomous systems requires policy enforcement at the action layer, not just content filtering at the model layer.

Circuit breaker pattern#

When an agent triggers an unusual volume of policy violations (suggesting it’s operating outside its intended parameters), a circuit breaker trips. All further tool calls are blocked until an administrator reviews the violation log and manually resets the breaker.

The circuit breaker catches scenarios that individual policy rules might miss: an agent that technically stays within each individual boundary but collectively exhibits behavior that no one intended. A procurement agent that issues 200 purchase orders in an hour, each under the approval threshold, is technically compliant with each individual rule, but the aggregate behavior signals a problem that the circuit breaker catches.

Incident response integration#

Policy violations should route directly into your incident management workflow. When a critical policy is triggered (a blocked data access attempt, a denied financial transaction, a circuit breaker trip), the alert goes to ServiceNow, PagerDuty or whatever system your operations team uses.

The policy evaluation log provides the forensic evidence: what action was attempted, what rule blocked it, what data would have been accessed and who or what initiated the request. This is the audit trail that compliance teams need and that EU AI Act Article 12 requires.

Implementation playbook: from audit-only to production enforcement#

Moving from policy documents to running enforcement is a seven-step process that takes most teams 6-8 weeks for their first set of agents. The sequence matters: you start by observing what agents do, define hard boundaries based on real data, layer in graduated policies, version-control everything in Git, test in CI/CD, enable enforcement with intensive false-positive monitoring and then iterate as agents and regulations evolve.

The full playbook (including detailed actions, completion checklists, a policy language decision framework and an EU AI Act compliance mapping table) is available as a downloadable PDF.

The document is not the governance#

There is a clarifying question that every enterprise should ask about its AI agent governance program: if we deleted every governance document tomorrow, what would change about how our agents behave?

If the answer is “nothing” (because the policies exist only as documentation and not as running code), then the governance program is aspirational, not operational.

Policy-as-code makes governance real. It takes the rules your compliance team defines and turns them into enforcement that runs at the speed your agents operate. Build-time checks catch violations before deployment. Deploy-time gates prevent uncertified agents from reaching production. Runtime enforcement blocks unauthorized actions at the moment they’re attempted.

The document is not the governance. The code is.

Roval implements policy-as-code enforcement across all three layers. Build-time: the agent registry blocks high-risk agents from production without active certification. Runtime: the Observer captures every tool call and evaluates policy violations within seconds. The LLM monitor captures every prompt and flags prohibited patterns. All violations are logged to the immutable audit trail, exportable as CSV or JSON for auditors, turning compliance certification from a periodic exercise into a continuous, evidence-generating process.

Sources and further reading#

SourceURL
Open Policy Agent (OPA)https://www.openpolicyagent.org
AWS Cedar Policy Languagehttps://www.cedarpolicy.com
AWS Policy in AgentCorehttps://byteiota.com/aws-policy-agentcore-cedar-language-secures-ai-agents/
NexaStack, “Agent Governance at Scale”https://www.nexastack.ai/blog/agent-governance-at-scale
Singapore IMDA, Agentic AI Governance Framework (Jan 2026)https://www.roedl.com/en/insights/singapore-model-ai-governance-framework/
WEF, “AI Agents in Action” (Nov 2025)https://www.weforum.org/publications/ai-agents-in-action-foundations-for-evaluation-and-governance/
WEF, Progressive Governance for AI Agents (Dec 2025)https://www.weforum.org/stories/2025/12/ai-agents-onboarding-governance/
Kellton, Enterprise Agentic AI Architecture 2026https://www.kellton.com/kellton-tech-blog/enterprise-agentic-ai-architecture
EU AI Act, Article 9 (Risk Management)https://artificialintelligenceact.eu/article/9/
EU AI Act, Article 12 (Record-keeping)https://artificialintelligenceact.eu/article/12/
EU AI Act, Article 99 (Penalties)https://artificialintelligenceact.eu/article/99/
EU GPAI Code of Practice (Jul 2025)https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai
EIOPA, Opinion on AI Governance and Risk Managementhttps://www.eiopa.europa.eu/
ENISA Advisory Group, Opinion Paper on Cybersecurity for AI (2025)https://www.enisa.europa.eu/
Virginia Dignum, AAMAS 2025 Keynotehttps://www.ifaamas.org/Proceedings/aamas2025/pdfs/p1.pdf
Forrester, AI Agent Governance Gap 2026 (via Thinking.inc)https://thinking.inc/en/blue-ocean/agentic/enterprise-agent-governance/
MaybeDont.ai, AI Guardrailshttps://maybedont.ai/solutions/ai-guardrails/
Reco.ai, Guardrails for AI Agents Guidehttps://www.reco.ai/hub/guardrails-for-ai-agents