Agent drift: why your certified agent is already out of compliance

The certification snapshot problem#

You certified your customer support agent 90 days ago. Since then: the LLM provider pushed a model update, a developer connected a new customer database, the agent’s original owner left the company and the agent’s response patterns have shifted in ways nobody measured. Is it still compliant?

The honest answer is: you don’t know. And neither does your auditor.

This is the fundamental problem with point-in-time compliance for AI agents. A certification is a snapshot: a statement that at this moment, this agent meets these requirements. The moment after certification, changes begin accumulating. Each one is small. None individually triggers a formal re-assessment. But collectively, they can shift an agent’s risk profile from compliant to exposed without anyone noticing.

An AI agent can appear reliable while working toward unwanted outcomes due to a gradual separation from the agent operator’s original intention or goal.

Kyndryl’s Ismail Amla defines this phenomenon as “agentic drift”: when “an AI agent can appear reliable while working toward unwanted outcomes due to a gradual separation from the agent operator’s original intention or goal.” The term captures something specific to agents that traditional compliance frameworks don’t address: the gap between what an agent was certified to do and what it’s doing right now.

The distinction between model drift and agent drift matters. Model drift is a statistical phenomenon: the distribution of input data changes and the model’s predictions become less accurate. Agent drift is an operational phenomenon. The agent’s actions change: it starts calling different tools, accessing different data, making different decisions. A model can drift silently. An agent drifts visibly, if you’re watching.

Kyndryl's Ismail Amla on agentic drift and policy-as-code
Kyndryl on agentic drift and policy-as-code | Source

Most organizations aren’t watching. Governance teams certify agents at deployment time, add them to a spreadsheet and move on. The agent keeps running, accumulating changes, shifting behavior. Nobody re-checks until the next audit and by then the gap between the certification record and the agent’s current state can be months wide.

The numbers confirm this. AGAT Software’s 2026 analysis paints a stark picture of the confidence gap between what leaders believe and what their governance infrastructure can deliver.

A 2026 AGAT Software analysis found that 82% of executives believe their existing policies protect against unauthorized agent actions, yet only 24.4% of organizations have full visibility into which agents are communicating with each other (citing Gravitee’s 2026 survey). The gap between perceived and actual governance is where compliance failures live.

That 82% confidence figure is particularly concerning because it suggests organizations are building on a false sense of security. Executives believe existing controls are sufficient. The same controls that were designed for traditional software, not for autonomous agents that make their own decisions about tool use and data access.

Point-in-time compliance was designed for static systems: servers, databases, applications that change through controlled change management processes. Agents aren’t static. They make runtime decisions, interact with other agents and operate in environments that shift continuously. The compliance model has to shift with them.

AGAT Software analysis of AI agent security gaps in enterprise
AGAT Software: AI Agent Security in Enterprise 2026 | Source

Three types of agent drift#

Not all drift is the same. Understanding the type determines how you detect it and what you do about it.

Certification drift#

The agent hasn’t changed. The world has. A certification expires because time passed, a new regulation was published, a compliance framework was updated or a new requirement was added to an existing framework. The certification was valid when it was issued, but the standard it was measured against has moved.

Detection: Calendar-based (expiry dates tracked automatically) plus event-driven (monitoring for regulatory changes, framework updates and new internal policy requirements).

Remediation: Re-certify the agent against the updated framework. No technical changes to the agent are needed, but the evidence base may need to be refreshed and new requirements may need to be addressed.

Example: Your agent was certified against GDPR requirements in January. In August, the EU AI Act’s high-risk system rules take full effect (August 2, 2026). Your existing GDPR certification doesn’t cover the new Article 14 human oversight requirements. The agent hasn’t changed, but the compliance requirements have.

Configuration drift#

The agent was modified after certification. Someone changed its technical configuration: updated the model version, connected a new data source, broadened tool permissions, changed a dependency. The certification no longer reflects the agent’s current state.

Detection: Diff-based comparison of the agent’s current configuration against the configuration that was certified. Every state change is captured as a before/after snapshot in the audit log. When the current state diverges from the certified state, a drift event is created.

Remediation: Assess whether the change is material (does it affect the agent’s risk classification?). If yes, re-classify the agent using the three-dimension scoring model, then re-certify against the applicable framework. If no, document the change and update the certification record without a full re-certification.

Example: Your procurement agent was certified as Tier 2 (Medium risk) with access to the vendor database and purchase order system. A developer connects it to the employee payroll system to enable expense reconciliation. The agent now has access to restricted PII (salary data). Its data sensitivity score jumps from 2 (Internal) to 4 (Restricted), pushing its composite risk tier from 2 to 3. The original certification is no longer valid. The agent needs re-classification and re-certification with a 180-day expiry instead of 365.

Behavioral drift#

The agent’s configuration hasn’t changed, but its runtime behavior has diverged from the patterns observed during certification. It’s making different tool calls, accessing different data patterns, operating at different volumes or producing outputs that don’t match its established baseline.

This is the hardest type to detect because nothing in the agent’s configuration explains it. The causes are often environmental: distributional shifts in the input data the agent processes, accumulated context that alters its decision-making patterns, interactions with other agents that weren’t present during certification or subtle changes in the APIs and services it depends on.

Detection: Behavioral baseline comparison, the same mechanism used by guardian agents to monitor runtime behavior. Compare current call rate, tool frequency distribution, edit size patterns and error rates against the baseline established during the certification period. When metrics deviate beyond a configurable threshold, a drift event is created.

Remediation: Investigate the root cause. If the behavioral change is benign (the agent is handling a different mix of requests due to seasonal patterns), update the baseline and document the change. If the behavioral change indicates a problem (the agent is accessing data sources outside its authorized scope), restrict the agent and investigate before re-certifying with an updated baseline.

Example: Your customer support agent was certified with a behavioral baseline showing an average of 15 tool calls per session, 80% of which were knowledge base lookups. Over the past month, the average has climbed to 35 tool calls per session, with 40% now being CRM record updates. The configuration hasn’t changed, but the agent’s behavior has shifted significantly, possibly because the underlying model’s updated version is more action-oriented. This shift may have moved the agent from advisory (Tier 1 decision authority) to supervised (Tier 3), requiring a completely different governance regime.

Each drift type requires a different detection method and a different remediation path. Certification drift needs a calendar and a regulatory monitoring feed. Configuration drift needs before/after snapshots and a diff engine. Behavioral drift needs a runtime baseline and statistical anomaly detection. Treating all drift the same or worse, not distinguishing between types at all, leads to either false alarms that erode trust in the governance system or missed violations that surface during the next audit.

The seven drift triggers#

Every material change to an agent or its environment can be traced to one of seven triggers. Each maps to a drift type, a detection method and a remediation action.

1. Model version update. The LLM provider pushes a new version. The agent’s behavior may change even though no one touched its configuration. Drift type: behavioral. Detection: baseline comparison after model change event. Remediation: re-validate behavioral baseline; re-certify if material deviation.

2. Data source change. A new database, API or file system is connected to the agent. The agent’s data sensitivity classification may need to change. Drift type: configuration. Detection: webhook event from connector or infrastructure layer. Remediation: re-score data sensitivity dimension; re-classify and re-certify if tier changes.

3. Tool permission expansion. The agent is granted access to new tools or broader permissions on existing tools. Its decision authority classification may need to change. Drift type: configuration. Detection: permission change event in the registry. Remediation: re-score decision authority dimension; re-classify if tier changes.

4. Owner departure. The person accountable for the agent leaves the organization. The agent becomes orphaned. No one is responsible for its governance. Drift type: certification (accountability gap). Detection: HR system integration or manual flag. Remediation: assign new owner immediately; trigger governance review.

5. Dependency change. An upstream agent or service changes its output format, behavior or availability. The governed agent may behave differently even though nothing about it changed. Drift type: behavioral. Detection: dependency graph monitoring + baseline comparison. Remediation: investigate downstream impact; re-validate behavioral baseline.

6. Regulatory change. A new law, updated framework or reinterpreted requirement changes the compliance environment. Existing certifications may no longer cover all applicable requirements. Drift type: certification. Detection: regulatory monitoring feed + framework version tracking. Remediation: gap analysis against new requirements; re-certify with additional evidence.

7. Behavioral anomaly. The agent’s runtime patterns diverge from its established baseline without any identifiable configuration or environmental trigger. Drift type: behavioral. Detection: continuous baseline comparison (every 15 minutes). Remediation: investigate root cause; restrict if safety-critical; re-certify with updated baseline.

Worked example: 60 days after certification#

Your customer support agent was certified on January 15. Here’s what happened in the 60 days since:

  • Day 12: LLM provider pushed model update (Trigger 1, behavioral drift risk)
  • Day 23: Developer connected the agent to the returns database (Trigger 2, configuration drift, data sensitivity may have increased)
  • Day 31: Original agent owner moved to a different team (Trigger 4, certification drift, accountability gap)
  • Day 45: The company deployed a new triage agent that routes tickets to your agent (Trigger 5, behavioral drift, the input distribution changed)
  • Day 55: Behavioral baseline comparison shows tool call volume up 130% and new data access patterns (Trigger 7, behavioral anomaly)

Five triggers fired in 60 days. The agent’s certification from January 15 reflects none of these changes. Without continuous monitoring, the first time anyone discovers the gap is the next quarterly audit or worse, a compliance incident.

From point-in-time to continuous certification#

The continuous certification model replaces periodic audits with always-on monitoring that detects drift as it happens and triggers re-assessment automatically. An agent registry that tracks every agent’s configuration state, certification status and risk tier is the foundation. Without a single source of truth for what each agent is and what it’s certified against, drift detection has nothing to compare.

Auto-expiry by risk tier#

Not all agents need the same certification cadence. The risk tier from Pillar 2 determines how frequently certifications expire and how often formal re-assessments occur:

TierLabelCert expiryFormal re-assessmentRationale
1Low365 daysAnnuallyStable behavior, low blast radius, minimal regulatory exposure
2Medium365 daysEvery 180 daysModerate risk, periodic review catches gradual drift
3High180 daysEvery 90 daysFrequent changes, sensitive data or external-facing
4Critical90 daysContinuous + 90-day formalMaximum oversight, safety-critical or regulated

Drift detection cadence#

Different drift types require different detection frequencies:

Configuration drift: on every change event. When a webhook fires (new data source, tool permission change, dependency update), compare the current configuration against the certified state immediately. No delay. The diff is computed in real-time and a drift event is created if the change is material.

Behavioral drift: every 15 minutes. The behavioral baseline comparison runs on a 15-minute cadence: frequent enough to catch deviations before they compound, infrequent enough to avoid alert fatigue. Compare current tool call rates, data access patterns, error rates and session metrics against the established baseline.

Certification drift: daily. Check all active certifications against their expiry dates once per day. Send a warning alert 30 days before expiry. Flag expired certifications immediately. Monitor for regulatory changes on a weekly cadence and trigger gap analyses when new requirements are identified.

Auto-escalation rules#

When drift is detected, the system escalates automatically based on severity:

30 days before certification expiry: Warning alert sent to the agent owner and the compliance team. No action required yet. This is a heads-up to schedule re-certification.

Certification expired (Tier 1-2): Alert flagged for review. The agent continues operating, but the expired certification is visible in the compliance dashboard and the agent’s status changes to “Certification Expired” in the registry.

Certification expired (Tier 3-4): The agent is blocked from production via the production gate. It cannot process new requests until re-certified. In-flight requests complete, but no new work is accepted. This is a hard block enforced at the API level, not a guideline.

Configuration change detected on a certified agent: A drift event is created with the before/after diff. Severity is assessed automatically based on which dimensions are affected (data sensitivity changes are always high-severity; tag changes are low-severity). Material changes trigger a re-certification workflow.

Behavioral anomaly detected: Alert routed to the agent owner with the specific metrics that deviated and by how much. For Tier 3-4 agents, the guardian agent may automatically restrict the agent’s scope while the anomaly is investigated.

The production gate#

For Tier 3 and above, the production gate creates a hard enforcement boundary: agents at the Staging lifecycle stage cannot promote to Production without an active, non-expired certification. This isn’t a checkbox. It’s an API-level block with an explanatory error. If the certification has expired or if a material configuration change has invalidated it, the promotion is denied until the agent is re-certified.

This single mechanism prevents the most common drift failure: a high-risk agent operating in production with a stale certification that no one noticed had expired.

The regulatory case for continuous monitoring#

The EU AI Act doesn’t use the word “drift,” but its requirements map directly to the continuous monitoring model.

Article 9(9) requires providers of high-risk systems to ensure “continuous iterative updating of the risk management system,” not a one-time assessment, but an ongoing process that evolves as the system operates. Article 61 mandates a post-market monitoring plan and Article 72 establishes post-market monitoring obligations for providers. Together, these articles create a regulatory expectation of continuous compliance, not point-in-time certification.

Drift detection, recertification cadence and ongoing runtime action visibility are mandatory post-deployment governance requirements.

The XRSI RDG-AX framework (Kavya Pearlman, 2026) makes this explicit for agents. Gate 5 of the framework defines “drift detection, recertification cadence and ongoing runtime action visibility” as mandatory post-deployment governance requirements.

On the liability side, Baker Botts’ legal analysis notes that non-human identities are expected to exceed 45 billion by the end of 2026, yet only 10% of organizations have a strategy for managing them. California’s AB 316, effective January 1, 2026, forecloses the “AI did it” defense. If your agent causes harm, you cannot argue that you lacked control over its decisions. Continuous monitoring is no longer a best practice; it’s a legal necessity.

Deloitte’s 2026 report on AI ROI in the Nordics found that 58% of Nordic respondents using agentic AI anticipate 3+ years for significant ROI, compared to 37% in the rest of Europe. Part of this caution stems from governance uncertainty: organizations don’t know which agents are safe to scale. Continuous certification resolves this directly: agents with active, non-expired certifications and clean drift records can be scaled with confidence. Agents with expired certifications or unresolved drift events are held.

Designing a drift detection system#

Moving from point-in-time compliance to continuous monitoring requires a system that watches for all three drift types simultaneously. The architecture isn’t complex, but it has to be comprehensive. A gap in any detection layer creates a blind spot where drift accumulates unnoticed.

IBM’s framework for agent observability provides a useful starting point for understanding the monitoring infrastructure required.

The truth about AgentOps and AI agents flying blind via YouTube

The drift detection architecture has four components that work together:

Configuration snapshot store. Every state change to an agent (registration, classification, certification, tool permission change, data source connection, owner assignment) is recorded as a before/after snapshot in an immutable audit log. When a drift check runs, the current state is compared against the certified state. Any delta is evaluated for materiality.

Behavioral baseline engine. After 30+ tool calls, the system builds a behavioral profile for each agent: typical call rate, tool frequency distribution, edit size patterns, data access patterns and error rates. The baseline is stored as the reference point. Every 15 minutes, current metrics are compared against the baseline. Deviations beyond configurable thresholds generate drift events. This is the same mechanism described in the Guardian Agents article. The guardian detects behavioral drift and the certification system records it.

Expiry calendar. A scheduled job checks all active certifications daily. Certifications approaching expiry (30-day warning) generate informational alerts. Expired certifications generate critical alerts and, for Tier 3+, trigger the production gate to block the agent.

Alert router. Drift events are routed based on severity and agent tier. Low-severity events (informational, Tier 1-2) go to the agent owner’s dashboard. High-severity events (expired certs, material config changes, behavioral anomalies on Tier 3-4) go to the compliance team and trigger automated workflows (restrict agent, create re-certification task, notify stakeholders).

This architecture connects to the rest of the governance stack: Policy-as-Code (Pillar 3) defines the rules that determine what constitutes a material change. Risk Classification (Pillar 2) determines severity calibration. Adaptive Human Oversight (Pillar 6) routes exceptions to the right human reviewer. And the 8-Pillar Framework ties it all together.

Drift is the default#

Every certified agent is drifting. The question isn’t whether your agents have drifted since their last certification. They have. The question is whether you can detect it, measure it and respond to it before the auditor arrives.

The continuous certification model (auto-expiry by risk tier, 15-minute behavioral monitoring, event-driven configuration drift detection and automated escalation) turns compliance from a periodic exercise into an always-on system. Agents that are compliant stay compliant. Agents that drift get caught, flagged and remediated before the gap becomes a violation.

Start with your Tier 3 and Tier 4 agents. Set up expiry tracking. Enable configuration snapshots. Build behavioral baselines. And connect the drift detection system to your compliance certification pipeline so that no high-risk agent operates without a current, valid certification.

The certification is the starting point, not the destination. What matters is what happens after.

Sources and further reading#

SourceURL
Kyndryl / Ismail Amla, “Agentic Drift” definition (Mar 2026)https://www.siliconrepublic.com/machines/kyndryl-policy-as-code-ai-agentic-drift-enterprise
AGAT Software, AI Agent Security 2026 (82% executive confidence gap)https://agatsoftware.com/blog/ai-agent-security-enterprise-2026/
XRSI RDG-AX Framework (Kavya Pearlman, 2026)https://xrsi.org/why-governing-agentic-ai-requires-a-new-kind-of-framework
Baker Botts, “When AI Agents Misbehave” (2026)https://ourtake.bakerbotts.com/post/102me2l/when-ai-agents-misbehave-governance-and-security-for-autonomous-ai
MI9 — Agent Intelligence Protocol (Wang et al.)https://arxiv.org/html/2508.03858v1
Engin & Hand, “Dimensional Governance for Agentic AI” (UCL/Imperial)https://arxiv.org/abs/2505.11579
Noam Kolt, “Governing AI Agents” (Notre Dame Law Review)https://arxiv.org/abs/2501.07913
EU AI Act, Article 6 (Classification Rules)https://artificialintelligenceact.eu/article/6/
EU AI Act, Articles 9, 61, 72 (Monitoring Requirements)https://artificialintelligenceact.eu/
Deloitte, AI ROI in the Nordics (2026)https://www.deloitte.com/no/no/issues/generative-ai/ai-roi-in-the-nordics.html
KDnuggets, “Emerging Trends in AI Ethics and Governance” (Dec 2025)https://www.kdnuggets.com/emerging-trends-in-ai-ethics-and-governance-for-2026
Dignum, CAIML/IWM Presentation (Sep 2025)https://caiml.org/dighum/announcements/virginia-dignum-beyond-hype-and-fear-2025-09-08/