SR 26-2 lands: agentic AI was carved out. Here's what banks running agents should do today

The footnote that defines a decade#

The Office of the Comptroller of the Currency, the Federal Reserve and the FDIC published SR 26-2 on April 17 2026. It is the first joint update to model risk management guidance since SR 11-7 in 2011. Banks have been waiting fifteen years.

Page 3 of the attachment, footnote 3:

“Generative AI and agentic AI models are novel and rapidly evolving. As such, they are not within the scope of this guidance.”

The follow-on sentence is the part most coverage will skip:

“Nonetheless, a banking organization’s risk management and governance practices should guide the determination of appropriate governance and controls for any tools, processes or systems not covered in this document.”

Translation: the agencies are not going to write the playbook for agents yet. They expect banks to write their own. The bank that produces a coherent answer in 2026 will set the precedent the supervisor uses in 2027.

That is the story. Banks who already deployed agents in production are operating in a gap year. The principles in SR 26-2 apply to “traditional statistical and quantitative models and non-generative, non-agentic AI models.” Everything else is the bank’s problem.

Inside SR 26-2#

Strip the agent question for a moment. The shape of SR 26-2 is worth reading on its own merits because it tells you what the agencies think sound model risk management looks like in 2026.

Tailored and risk-based. The guidance opens with the line that “practices that are appropriate and effective for one banking organization may be inappropriate and ineffective for a banking organization with a different risk profile or that uses a model for a different purpose.” A small institution with a handful of credit-scoring models is not expected to operate the same machinery as a global bank running a thousand pricing, capital, stress-testing and surveillance models. Materiality decides rigor.

Page 3 of the SR 26-2 attachment showing footnote 3, which excludes generative AI and agentic AI from the guidance scope
Page 3 of the SR 26-2 attachment. Footnote 3, at the bottom, places generative and agentic AI outside the scope of the guidance. Federal Reserve, SR 26-2 attachment (PDF)

Three pillars, preserved. The structure that practitioners learned from SR 11-7 carries over almost intact. Model development and use, including testing. Model validation and monitoring, including outcomes analysis. Governance and controls. The vocabulary is updated. The architecture is the same.

Effective challenge throughout. The guidance hammers on independent critical analysis as the connective tissue. Effective challenge means the people questioning the model have the technical expertise, the organizational independence and the standing to change it. That last clause matters. A validation function that can flag concerns but cannot stop deployment is not effective challenge. It is a paper trail.

Aggregate risk recognized. SR 26-2 is more explicit than SR 11-7 was about the failure mode where individual models look fine and the portfolio does not. “Aggregate risk reflects interactions and dependencies among models; reliance on common assumptions, data or methodologies; and any other factors that could adversely affect several models and their outputs simultaneously.” This is correlation risk in model risk clothing.

Materiality controls everything. Not every model warrants the full machine. The guidance is explicit that immaterial models can be governed lightly, with monitoring focused on detecting whether they have become material. Allocate scrutiny where the loss exposure is.

The principles are sound. They are also constructed for a world where the model produces a number, the user reads the number and the user takes the action. That world is not the world of the agent.

Where the principles transfer to agents and where they break#

If you operate a production agent, the SR 26-2 framework gives you scaffolding for half the problem.

Conceptual soundness transfers. The validation question for a credit model is whether the modelling choices, assumptions and theory hold up. The validation question for an agent is whether the prompt scaffolding, tool authorizations, decision policies and escalation rules hold up. The shape is the same. The artifacts are different.

Outcomes analysis transfers. SR 11-7 taught us to compare model output to real-world outcomes. For an agent, outcomes analysis is comparing what the agent did to what it should have done. Measure that by whether the action satisfied the policy, completed the task within scope and stayed inside its authorization boundary. The cadence is faster (real-time, not quarterly) but the discipline is the same.

Ongoing monitoring transfers. The agent equivalent is runtime observability: every tool invocation, every LLM request, every data access. A bank that already has telemetry on traditional models has the operational muscle to extend it. A bank that does not is going to learn the hard way.

Governance and controls transfer. Roles, accountability, policies and audit independence all map directly. Most banks will need to add a tier (an agent steering committee or equivalent) but the structural ask is familiar.

Then the principles break.

SR 26-2 assumes the model produces an estimate. The definition of “model” is “a complex quantitative method, system or approach that applies statistical, economic or financial theories to process input data into quantitative estimates.” Agents do not produce estimates. They produce actions. The validation framework for “is this estimate accurate” does not generalize to “is this action authorized, reversible and bounded.”

SR 26-2 assumes a human decides what to do with the output. The framework for human-in-the-loop oversight is implicit throughout. Agents close that loop. The right question stops being “did the human use the model output well” and becomes “did the agent’s autonomous action stay within the policy.” That is a different validation problem.

SR 26-2 has no concept of tool use. The guidance covers data inputs, methodology choices and outputs. It says nothing about an agent that calls fifteen APIs, writes to two databases and sends three emails as part of producing what an SR 26-2-style model would call a single output. The risk surface for tool use (poisoned tools, scope escalation, chained authorizations) is invisible inside the SR 26-2 framework.

SR 26-2 has no concept of behavioral drift. Models drift; agents drift differently. An agent’s behavior can shift because the underlying foundation model was updated, because its prompt scaffolding was edited, because its tool catalog changed or because its instructions interact with new types of input it was not validated against. Continuous certification (re-validation on a cadence and on every material change) is the agent equivalent of ongoing monitoring. SR 26-2 does not name it.

SR 26-2 has no concept of inter-agent dependencies. Aggregate risk in SR 26-2 is about correlated assumptions across models. Aggregate risk for agents is about agents calling other agents, with authorization tokens cascading through the chain and policy violations propagating across systems no individual control owner sees.

This is the gap that needs to be closed before the RFI lands.

The operating playbook for the gap year#

A bank running production agents in 2026 has a defensible posture if it does five things now.

Maintain a current and complete agent inventory. Every agent that touches any production system, with its owner, its risk classification, the foundation model it depends on, its tool catalog, its data access and its compliance status. The agencies’ first question in any future agent examination will be “show us your agent inventory.” A bank that cannot answer that question quickly is operating with the same risk profile that drove the original SR 11-7 — concentration of activity inside something the institution does not see clearly.

We cannot protect what we cannot see. In the era of agentic AI, organizations need an observability control plane.

Classify every agent by risk tier. Risk classification in SR 26-2 is materiality plus exposure plus purpose. The agent equivalent is autonomy level (advisory through fully autonomous), data sensitivity (public through restricted), action reversibility (logging only through irreversible external action) and compliance footprint (out-of-scope through regulated decisioning). A four-by-four-by-four-by-four matrix is too much; a single tier-1-through-tier-4 designation with the inputs documented is enough. The point is to make scrutiny proportional to exposure.

Run a validation gate before deployment. SR 11-7 institutions know how to gate models. Adapt the gate to agents:

  • Conceptual soundness review of prompt scaffolding and policies
  • Tool authorization review (is the agent allowed to access what it claims to need and only that)
  • Outcomes analysis on a representative test set with named pass criteria
  • Documented sign-off by validation, owner and a designated risk role

The form factor matters less than the artifact. There needs to be a record.

Monitor agents continuously in production. This is operational, not policy. Telemetry on every action with attribution to the agent, the user (if any), the foundation model version, the tool catalog version and the policy version in force. Real-time policy evaluation that can intercept a violation before it executes. Drift detection on a cadence measured in days, not quarters. Logs that an examiner can read.

Govern agents with clear roles and effective challenge. The agent inventory has owners. Owners have an escalation path. There is a governance forum where new agent classes get reviewed before deployment, where certified agents come back for periodic re-review and where incidents get post-mortemed. The independent challenge function has the standing to halt deployment.

This is the SR 26-2 framework applied to agents, with the gaps in SR 26-2 filled in. It is not the only valid posture. It is a defensible one.

What the RFI will probably ask#

Vasu Jakkal's RSAC 2026 keynote framed the agentic-AI security and governance gap that supervisors will eventually have to legislate against. YouTube

The agencies have signaled the questions they care about. Read the SR 26-2 cover letter, the OCC bulletin and the public statements made in the months before publication. Three themes recur.

How do banks identify agents that fall inside the model risk perimeter? The classic SR 11-7 definition (statistical or financial theory underpinning, quantitative estimate as output) does not capture agentic systems cleanly. The RFI will probe how banks are drawing the perimeter today.

What validation evidence is sufficient for an autonomous system whose outputs are not estimates but actions? The agencies will want to understand how banks are evidencing that an agent operates within its authorization boundary. They will want to know what the equivalent of outcomes analysis looks like when the outcome is “an action was taken” rather than “a number was produced.”

How is governance accountability allocated when the agent is built on a foundation model the bank does not control? Vendor model risk has been part of SR 11-7 from the start. Agents intensify the question. The agent’s behavior depends on the prompt scaffolding (the bank), the foundation model (a vendor), the tool catalog (typically a mix) and the runtime context (largely the bank). Allocating accountability across that chain is not solved.

A bank that has thought through these questions before the RFI lands has a head start. A bank that has not is responding to a regulator’s deadline.

The practical move#

The agencies have given banks running agents a piece of unusual freedom: you have the next twelve to thirty-six months to define what good looks like. The supervisor will read what the industry produces and will codify the median or the strongest. If your institution is investing in agents seriously, this is the window to invest in agent governance with comparable seriousness.

Roval implements this directly. The platform maintains the agent inventory, classifies agents by risk tier, runs the continuous certification loop, captures the runtime telemetry and produces the audit-ready evidence trail an examiner expects. It maps to the SR 26-2 principles where they apply (development and use, validation and monitoring, governance and controls, effective challenge) and adds the agent-specific controls SR 26-2 does not cover (tool authorization, runtime policy enforcement, behavioral drift detection, inter-agent dependency tracking).

For an introduction to how the framework maps end to end, see the eight pillars. For the financial-services-specific application, see agent governance in financial services. For the broader compliance baseline, SOC 2 for AI agents and ISO 42001 compliance for AI agents are the companion reads.

Sources#

SourceDateURL
Federal Reserve, SR 26-2 cover letterApril 17 2026https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm
Federal Reserve, SR 26-2 attachment (Supervisory Guidance on Model Risk Management)April 17 2026https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf
OCC Bulletin 2026-13, Model Risk Management: Revised GuidanceApril 17 2026https://occ.gov/news-issuances/bulletins/2026/bulletin-2026-13.html
Federal Reserve, SR 11-7 (predecessor guidance)April 4 2011https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm