Model risk governance framework under SR 26-2 and OCC 2026-13 replacing SR 11-7

SR 26-2 / OCC 2026-13: The Practical Guide to What Replaced SR 11-7

"The annual revalidation is dead" is the least interesting thing about SR 26-2. The real shift: from a calendar to a risk-based regime you have to defend.

June 8, 2026

•

18 min read

Model Risk After SR 26-2: What Changes When the Annual Revalidation Goes Away

For fifteen years, model risk management ran on a calendar. SR 11-7 set the expectation in 2011, and the operating rhythm followed: every model, every year, a full revalidation whether or not anything about the model or its environment had changed. Validators spent Q4 re-running the same battery on models that hadn't drifted, while the models that actually mattered — the ones near a decision boundary, the ones quietly degrading on a population that no longer looked like the development sample — waited their turn in the queue.

On April 17, 2026, that framework was replaced. The Federal Reserve issued SR 26-2, the OCC issued Bulletin 2026-13, and the FDIC issued FIL-15-2026 — three instruments, coordinated, superseding SR 11-7 and the 2021 SR 21-8 model-risk guidance. The headline most people took away was "the annual revalidation is gone." That is true, and it is also the least interesting thing about the change.

The real shift is from a cadence-based regime to a risk-based one. The supervisors stopped telling you when to validate and started telling you to justify when you validate. That sounds like less work. For a credit risk function running PD, LGD, scoring, and CECL models, it is almost certainly more — and more defensible — work. This piece walks through what actually changed, how to build the tiering and trigger logic the new framework assumes you have, and the four places teams are going to get this wrong.

What actually changed

Strip away the summaries and four things matter for a credit shop.

Materiality tiering is now the organizing principle. The old guidance acknowledged that models differ in risk; the new framework makes risk-tiering the thing your entire validation program hangs on. You are expected to classify every model by the consequence of it being wrong, and to scale validation intensity to that tier. A challenger PD model feeding a research deck and the PD model feeding your CECL allowance are no longer subject to the same default expectation. That was always defensible in principle. It is now the baseline expectation.

The annual revalidation requirement is dropped — and replaced with trigger-based revalidation. Removing the calendar does not remove the obligation. It moves the obligation onto you to define the events that force a revalidation: performance degradation past a threshold, a material shift in input population, a change in the economic environment the model wasn't built for, a model change, a data-source change. The annual review becomes a floor for the highest tier and a justification exercise for everything below it. If a model goes two years without a full revalidation, the examiner's question is no longer "why didn't you follow the calendar" — it's "show me the monitoring that told you it was safe to wait."

The model definition narrowed. The new definition is tighter about what counts as a "model" versus a deterministic calculation or a business rule. This is a genuine relief for teams that spent years validating spreadsheet lookups as if they were neural nets — but it creates a new hazard, which we'll get to, because the things that fall out of the model definition don't fall out of existence.

Generative and agentic AI got a carve-out and an open question. The framework explicitly scopes traditional machine-learning models in — your gradient-boosted PD model is a model, full stop, and the "it's AI, the rules are different" argument does not work. But generative and agentic AI systems were carved out of the core framework and pushed into a separate request for information. The supervisors said, in effect, "we don't yet know how to write validation expectations for a system that generates its own intermediate reasoning, so tell us how you're governing it." That gap is not permission. It's an elevated-risk flag.

One more thing, and it matters for how you read all of the above: this guidance is framed as non-enforceable supervisory guidance, not a rule. It does not carry the force of law the way a regulation does. Do not mistake that framing for license to do less. Examiners still examine against it, boards still get briefed on it, and "the guidance isn't binding" has never once been a satisfying answer in an exam.

The old approach, and why the calendar was always a proxy

The annual revalidation was never the goal. It was a proxy for the goal, which is "models that are working get to keep working, and models that aren't get caught before they cause a loss." The calendar was a blunt instrument that approximated that outcome by brute force: check everything often enough that you'll probably catch the failures.

The cost of the proxy was misallocation. Validation capacity is finite. Spending it uniformly means underspending on the models near a decision boundary and overspending on the stable, low-consequence ones. Every credit risk team has lived this: the validators are heads-down on the annual cycle for a model that hasn't moved in three years, while a scorecard quietly degrades against a population that shifted after a product change, and nobody looks until the quarterly monitoring deck flags it — if it flags it.

The new framework removes the proxy and asks you to manage the actual objective directly. That is harder. A calendar requires no judgment. A risk-based program requires you to defend every judgment.

The better approach: tiering and triggers as a system

The framework assumes you have two things you may not have built yet: a defensible materiality tiering, and a monitoring layer that fires triggers. They work together. Tiering tells you how hard to look; triggers tell you when to look harder regardless of tier.

Building a defensible tier

Tiering fails in exams for one reason above all others: it looks reverse-engineered to minimize work. If every model that would be expensive to validate annually happens to land in the lowest tier, no examiner believes your rubric — they believe you started from the answer. A defensible tier is built from inputs that exist independently of the validation cost.

For a credit model, the consequence dimensions are concrete: dollar exposure governed by the model's output, whether the output feeds a regulatory number (CECL allowance, capital, fair-lending testing), reversibility of a wrong decision, and the number of downstream decisions or models that consume the output. Score each, weight them, and let the tier fall out.

# Materiality tiering rubric — illustrative weights, not prescriptive def model_tier(model): score = ( 4.0 * model.exposure_band # $ governed by the output, banded 1-5 + 3.0 * model.feeds_regulatory_number # CECL/capital/fair-lending: 0 or 1, scaled + 2.0 * model.irreversibility # how hard to unwind a wrong call, 1-5 + 2.0 * model.downstream_consumers # count of dependent models/decisions, banded ) if score >= 40: return "Tier 1" # full annual revalidation floor + trigger monitoring elif score >= 20: return "Tier 2" # full revalidation on trigger or 24-month outer bound else: return "Tier 3" # ongoing monitoring; revalidate on trigger only

The specific weights are yours to set and defend. What matters is that the rubric is written down before it's applied, that it produces the same tier when two different people run it, and that the inputs aren't things you can quietly retune to move a model into a cheaper tier. The PD model feeding CECL lands in Tier 1 not because you decided it should but because the rubric forces it there — and you can show the examiner exactly why.

Defining triggers that actually fire

A trigger is a pre-committed threshold that converts a monitoring signal into a revalidation obligation. The discipline is committing to the threshold before you're staring at a number you'd rather not act on. The four trigger families that matter for credit models:

Performance degradation. A drop in discriminatory power (a Gini or KS decline past a set band), or a calibration break — predicted PDs drifting materially from realized defaults. Set the band; when it's breached, the trigger fires.
Population shift. The input population stops looking like the development sample. A population stability index past a threshold on the key drivers. This is the one that catches the post-product-change drift the annual calendar used to miss by luck.
Environmental change. The macro regime moves outside what the model saw in development. A model built and calibrated through a benign-credit window has not seen the environment it's now being asked to predict in. This is exactly the trigger a K-shaped consumer environment ought to fire — segment-level deterioration that the portfolio average hides.
Structural change. A model change, an input-data-source change, or a change to a model the output depends on. These are binary and the easiest to govern; they're also the easiest to forget when the change happens three teams away.

[DIAGRAM: Monitoring-to-trigger-to-revalidation flow — left column the four trigger families each with its metric and threshold; center a decision gate "threshold breached?"; right two paths, "log and continue" vs "open revalidation, notify model owner + validation + the tier's governance forum." Show the Tier 1 outer-bound timer feeding the same gate so the annual floor and the triggers converge on one queue.]

The point of drawing it as one flow: the calendar didn't disappear so much as it became one input among several into a single revalidation queue. For Tier 1 it's still there as a backstop. For everything below, the triggers do the work the calendar used to pretend to do.

Implementation: the EUC register and the validator independence principle

Two pieces of the build deserve special attention because they're where the narrowed model definition and the non-enforceable framing do the most quiet damage.

The end-user-computing register. When the model definition narrows, a population of spreadsheets, lookup tables, and deterministic adjustments falls out of model governance. They do not fall out of your process. An override grid that sits on top of a scorecard, a manual CECL adjustment, a hard-coded cutoff — these now sit in a governance gap: too simple to be a model under the new definition, too consequential to be ungoverned. Build an end-user-computing register that captures them, assigns an owner, and subjects them to a lighter-touch control that's proportionate to their consequence. The examiner who finds a material manual adjustment governed by nothing will not be impressed that it technically wasn't a model.

Validator independence as a principle, not an org chart. The old guidance leaned on structural independence — validation reports up a different chain than development. The new framework treats independence as a principle to be demonstrated through effective challenge, which is both more honest and more demanding. A validator in a separate box who rubber-stamps is not independent; a validator embedded close to the model who genuinely pushes back and can show the documented disagreements is. You now have to evidence the challenge — the questions asked, the pushback given, the things the developer changed because the validator wouldn't sign off. Independence you can't show in the file is independence the examiner won't credit.

Four ways teams will get this wrong

1. Reading "non-enforceable" as "do less." The framing is a statement about legal force, not about supervisory expectation. Teams that trim their program because "it's only guidance" are setting up the exam finding that takes the longest to remediate, because remediation means rebuilding the program you dismantled.

2. Treating the gen-AI carve-out as a free pass. The carve-out is the supervisors admitting they don't yet have validation expectations for generative and agentic systems. A system you can't validate under the existing framework, operating in a space the supervisors flagged as an open question, is a higher-risk system, not an unregulated one. Govern it as if the RFI's eventual answer will be demanding, because it will be.

3. Tiering that can't survive a skeptical read. If your rubric produces tiers that conveniently minimize validation cost, you don't have a tiering framework, you have a budget dressed as one. The fix is to fix the inputs before you see the outputs and to have two people run the rubric independently on a sample.

4. Multi-entity reconciliation drift. A bank holding company sits under the Fed; its national bank subsidiary under the OCC; an insured depository under the FDIC. Three instruments, issued together and substantively aligned — but you will still get asked to reconcile your single model-risk framework against all three, and to show it satisfies each. Build the framework once, map it to all three instruments explicitly, and keep the crosswalk current. The drift happens when one regulator updates and your crosswalk doesn't.

The takeaway

SR 26-2 didn't make model risk management lighter. It made it honest. The calendar was a way of not having to defend your judgment about which models matter and when they need another look. That defense is now the job. For a credit risk function, the work is to build a tiering rubric you'd be comfortable handing to a skeptical examiner, a trigger layer that fires before a degraded model causes a loss, and a governance net that catches the things the narrowed model definition lets slip. Do that, and the end of the annual revalidation is a genuine upgrade — capacity moves to where the risk actually is. Skip it, and you've traded a blunt instrument that worked by accident for no instrument at all.

‍

IN THIS ARTICLE

What the Rule Actually Changed

The weekly risk briefing.

Deep analysis on regulation, model risk, and fair lending — delivered every Tuesday.

Subscribe for free

RELATED ARTICLES

Models

SR 26-2 / OCC 2026-13: The Practical Guide to What Replaced SR 11-7