AI Readiness Assessment: How to Know If Your Data Is Ready
March 13, 2026Every firm we talk to is "doing AI." Most of them are stuck in the same place — the model works fine, but nobody trusts the outputs because the data underneath has no provenance, no structure, and no lineage.
An AI readiness assessment is how you figure out where you actually stand before spending six months on a project that stalls at the data layer.
What an AI Readiness Assessment Actually Is
It's not a maturity model. It's not a 50-question survey that produces a score nobody acts on. It's not a consulting engagement that ends with a PowerPoint deck.
An AI readiness assessment is a structured evaluation of whether your data can support AI systems that produce defensible outputs. Defensible meaning: auditable, traceable, reproducible. The kind of outputs you can hand to a regulator, a PM, or an LP and say "here's where this came from."
The assessment looks at four dimensions:
Lineage — can you trace every number to its source? Not "this came from the data warehouse" — the actual source document, filing, feed, or system that produced it.
Structure — can a machine navigate your data without a human explaining the schema? Can an agent start with a company name and walk to its filings, its metrics, its history — without someone writing custom queries?
Temporal context — does your data capture what was true when? Or does it only reflect the current state, with history overwritten by updates?
Documentation — are methodologies, transformations, and known limitations captured with the data? Or do they live in someone's head?
Why Most AI Projects Fail at the Data Layer
The pattern is pretty consistent. A team picks an AI use case — summarize these filings, flag these anomalies, generate these reports. They build or buy a model. They connect it to data. And the first output looks promising.
Then someone asks where the number came from. And the answer is "the model produced it from the data in the warehouse." Which is not an answer — it's a black box with extra steps.
The model was never the problem. The data was the problem. Specifically:
No source attribution — the data exists but nobody documented where each record originated. The analyst who built the pipeline knows. The AI system doesn't.
Flat structure — the data is in tables that humans query with SQL. An agent can't traverse relationships because the relationships aren't encoded in the data — they're encoded in the analyst's mental model.
No temporal versioning — when a filing gets amended or a rate gets revised, the old value is overwritten. The AI system makes decisions based on current data, but the decision it made last month was based on data that no longer exists in the system.
Tribal knowledge — the methodology for calculating a key metric is known by three people. None of them documented it. The AI system inherits the methodology without understanding it.
How to Run an Assessment
Start small. One dataset. One use case. The goal isn't to assess everything — it's to get a clear picture of one path through your data and understand what's missing.
Step 1: Pick the use case
Choose the AI application you care about most. "Summarize SEC filings for our portfolio companies." "Flag when a CMBS deal's performance deteriorates." "Generate weekly risk reports from our position data." Something specific.
Step 2: Trace the data
For that use case, trace every input back to its source. Literally follow the chain: the AI model takes input X. X comes from table Y. Table Y was populated by pipeline Z. Pipeline Z pulls from source A.
Document every step. Where does the chain break? Where does provenance go missing? Where does someone have to explain context that isn't in the data?
Step 3: Score each dimension
For the data involved in your use case, score each of the four dimensions:
Lineage — can you trace every record to its source document or system of record? Score: full (every record), partial (some records), or none.
Structure — can a machine traverse from entity to related data without custom queries? Score: navigable (graph or well-connected schema), queryable (requires SQL knowledge), or flat (requires human interpretation).
Temporal context — does the data preserve historical state? Score: versioned (full history preserved), snapshots (periodic captures), or overwrite (only current state).
Documentation — are transformations, methodologies, and limitations captured? Score: embedded (in the data/pipeline), external (separate docs), or tribal (in someone's head).
Step 4: Identify the gaps
The gaps are where AI will break. If lineage is missing, outputs can't be audited. If structure is flat, agents can't navigate. If temporal context is gone, historical decisions can't be reconstructed. If documentation is tribal, the AI system is building on assumptions nobody wrote down.
Step 5: Prioritize
You don't need to fix everything. Fix the gaps that matter for your target use case. If your use case is SEC filing summarization, lineage and temporal context are critical. If it's risk reporting, all four dimensions matter. If it's exploratory analysis, structure matters most.
What Good Looks Like
We've run this assessment for our own data — SEC filings, structured finance deals, regulatory data — and here's what the "after" state looks like:
Lineage: every extracted data point carries its EDGAR accession number, filing date, and source location. Trace any number to the exact document.
Structure: an agent can start with a fund name, navigate to its holdings, find a specific security, and walk to the source filing. The whole path is traversable.
Temporal context: when a filing gets amended, both the original and amendment are preserved with their dates. Decisions made before the amendment can be explained in context.
Documentation: extraction methodology is versioned and timestamped. Known limitations — parsing edge cases, formatting anomalies, missing tables — are captured in metadata.
This is what DealCharts runs on — 1,185 deals, 7,385 charts, every number traceable. It took time to build. But once the pattern was established, adding new datasets became dramatically faster. The assessment work compounds.
Common Assessment Findings
After running these for a while, the findings tend to cluster:
"We have lineage — but only inside the warehouse." The transformation chain from raw tables to reporting tables is documented. But the chain from source systems to raw tables is a mystery. Fix: extend lineage to the source boundary.
"Our data is structured — for humans." The schema is well-designed for SQL queries. But an agent can't traverse entity relationships without someone writing the joins. Fix: encode relationships explicitly, or build a graph layer.
"We version data — monthly." Monthly snapshots exist, but daily changes within a month are overwritten. Fix: implement event-level versioning or at minimum daily snapshots for AI-relevant data.
"The methodology is documented — in Confluence." Documentation exists but isn't connected to the data. When the methodology changes, the data and the docs drift apart. Fix: embed methodology metadata in the pipeline output, version it with the data.
The Assessment as a Workshop
We run AI readiness assessments as half-day workshops. Bring the data team, bring the use case, and we'll trace the data chain together — live, on your actual data. By the end, you have a scored assessment, a gap map, and a prioritized list of what to fix.
The assessment itself takes a few hours. The value is knowing exactly where your data breaks before you discover it six months into an AI project.
Related:
- How to Get Your Data Ready for AI — The practical guide to making data AI-ready
- AI Audit Trail — What evidence looks like when your system gets questioned
- Is Data Lineage Required for Compliance? — What regulators actually expect
- Book a Workshop — Run an assessment on your data

Zac Ruiz
Co-Founder
Technology leader with 25+ years' experience, including a decade in securitization and capital markets.
LinkedIn →