How It WorksProof
Book a Free Workshop
← Blog

Is Data Lineage Required for Compliance? What Regulators Actually Expect

March 13, 2026

The short answer is no — most regulators don't use the words "data lineage" in their rules. The practical answer is try getting through an SEC examination or OCC review without it.

We've worked with teams on both sides of this. Teams that had lineage and could point to the source of every number in their regulatory reports. And teams that didn't — who spent weeks reconstructing how a figure was calculated because the person who built the spreadsheet left two years ago.

The teams with lineage passed their exams. The teams without lineage passed too, eventually. But the cost — in time, in stress, in consultant hours — was sort of staggering.

What the Regulations Actually Say

Three frameworks matter here. None of them say "you must have data lineage." All of them require things that are practically impossible without it.

BCBS 239: Risk Data Aggregation

BCBS 239 — the Basel Committee's principles for effective risk data aggregation — is the closest thing to a lineage mandate in financial regulation.

Principle 3 (Accuracy and Integrity) requires that data used for risk reporting be "accurate, reliable, and produced on a timely basis." Principle 4 (Completeness) requires that it cover "all material risk data." Principle 7 (Accuracy of Risk Management Reports) requires that reports "be reconcilable to source data."

That last one is the key. Reconcilable to source data. If you can't trace a number in a risk report back to the data that produced it — through every transformation, aggregation, and calculation step — you can't reconcile it. And if you can't reconcile it, you're not compliant with Principle 7.

The standard doesn't say "build a lineage system." But the requirement it describes — tracing data from origin to report — is data lineage by definition.

SR 11-7: Model Risk Management

The Fed's SR 11-7 guidance covers model risk management — and it has direct implications for data lineage.

SR 11-7 requires that model documentation include "data inputs, transformations performed on those inputs, and the rationale for those transformations." It requires that model validation include "assessment of data quality and relevance." And it requires ongoing monitoring to ensure that "model performance does not deteriorate as conditions change."

Every one of those requirements gets harder without lineage. If you can't trace a model's inputs to their sources, you can't validate data quality. If you can't document the transformations, you can't explain the rationale. If you can't track how inputs change over time, you can't monitor for deterioration.

SR 11-7 doesn't mandate lineage. It mandates things that lineage makes possible and that are expensive to do any other way.

SEC Examination Standards

The SEC's examination program looks at whether firms have "policies and procedures reasonably designed to ensure the accuracy and completeness" of data used in client communications, regulatory filings, and investment decisions.

In practice, this means examiners ask questions like: "Where did this number come from?" "How was it calculated?" "Can you show me the source?"

Without lineage, answering those questions requires manual reconstruction — opening files, tracing spreadsheet formulas, finding the original data pull. With lineage, it's a lookup. Same answer, radically different cost.

The Practical Reality

Here's the thing that most compliance teams figure out eventually: the question isn't whether lineage is "required." The question is what it costs when you don't have it.

Without lineage, every audit response involves archaeology. Someone has to reconstruct the calculation chain from memory, email threads, and file timestamps. This takes weeks, burns senior staff time, and introduces risk — because reconstructed lineage is inherently less reliable than lineage captured at the time of calculation.

With lineage, audit response is a lookup. The number traces to its source. The methodology is documented. The temporal context is preserved. The examiner gets their answer and moves on.

The cost difference is sort of dramatic. We've seen teams spend 400+ hours preparing for a single exam that a team with lineage handles in under 40. That's a 10x difference — and it recurs every exam cycle.

What Good Data Lineage Looks Like in Practice

This isn't theoretical for us. We build data lineage for SEC filings, structured finance data, and regulatory reporting. Here's what it actually looks like:

Source attribution — every extracted data point carries its EDGAR accession number, filing date, and document location. An examiner asks "where did this come from?" and the answer is a direct link to the source filing.

Transformation history — when we calculate a metric from raw filing data, the calculation methodology is versioned and timestamped. If the methodology changes, both versions are preserved with their effective dates.

Temporal context — the system captures what was known when. If a filing gets amended, the original and the amendment both exist in the timeline. Decisions made before the amendment can be explained in the context of what was available at the time.

Entity resolution — the same company appearing as "NVIDIA Corp." in one filing and "NVIDIA Corporation" in another resolves to a single canonical entity. This prevents the kind of identity confusion that creates compliance risk.

Known gaps — when data can't be extracted cleanly — formatting issues, missing tables, ambiguous structures — that's captured in the metadata. The system knows what it doesn't know.

This is what DealCharts does for structured finance data — 1,185 CMBS and ABS deals with full provenance, every number traceable to its source filing. It's the same pattern we apply to any regulated dataset.

For AI Systems, Lineage Is Non-Negotiable

The regulatory dimension gets more interesting when AI enters the picture. If your AI system produces a number that shows up in a regulatory report or client communication, the question "where did that come from?" now has two layers: where did the data come from, and what did the model do with it?

Without data lineage, an AI system is a black box producing numbers that no one can trace. With lineage, the AI output connects to its inputs, the inputs connect to their sources, and the whole chain is auditable.

As regulators increase their focus on AI in financial services — and they are, for sure — the expectation that firms can explain AI outputs will only get stronger. Data lineage is the foundation of that explainability.

The Bottom Line

Data lineage isn't technically required. It's just that everything regulators do require — tracing data to sources, documenting transformations, reconciling reports, validating model inputs — is impractical at scale without it.

You can do compliance without lineage. You'll just spend 10x the time, 10x the money, and carry 10x the risk every time someone asks where a number came from.

To trace every number to its source. To document every transformation. To preserve the context of every decision. And all in a way that an examiner — or an agent — can verify independently.

That's not a compliance program. That's a competitive advantage.


Related:

  • How to Get Your Data Ready for AI — Practical guide to AI data preparation
  • AI Audit Trail — What evidence looks like when your system gets questioned
  • BCBS 239 Data Lineage — A practical guide to the standard
  • AI Provenance & Data Lineage — How CMD+RVL approaches lineage
Zac Ruiz

Zac Ruiz

Co-Founder

Technology leader with 25+ years' experience, including a decade in securitization and capital markets.

LinkedIn →
All posts

PRODUCTS

OutcomesData ProductsSignals

EVIDENCE

All Evidence

PRODUCTS

OutcomesData ProductsSignals

EVIDENCE

All Evidence

RESOURCES

How It WorksCalendarDiscoveryWays to WorkFoundationsGlossaryBlog

DEVELOPERS

Tools & Open SourceMachine Data

COMPANY

AboutPartnersContactLogin

CONNECT

GitHubX (Twitter)LinkedIn

MARKETPLACES

AWS MarketplaceSnowflake MarketplaceDatabricks MarketplaceKaggleWhop
© 2026 CMD+RVL. All rights reserved.
Decisions that hold up under scrutiny. Built on open standards.
PrivacyTermsSub-ProcessorsSecurity