3 min read

Your Biggest EU AI Act Compliance Risk Isn't the Model. It's the Data Underneath It.

Picture of Micah Horner Micah Horner : Updated on July 24, 2026

EU AI Act

The EU AI Act's high-risk AI obligations are coming. The fines are real: up to €35 million or 7% of global turnover. But the organisations most at risk are not the ones without AI policies. They are the ones that have AI in production and have not yet confronted Article 10.

Article 10 is not about the model. It is about the data underneath it.

What Article 10 actually requires

Article 10 of the EU AI Act applies to high-risk AI systems, a category that includes credit scoring, fraud detection, HR screening, clinical decision support, and AI used in critical infrastructure, all explicitly listed in Annex III.

For every system in that category, Article 10 requires that training, validation, and testing datasets are subject to documented data governance practices. "Documented" is the operative word. "We reviewed the data" is not sufficient. You need an auditable trail: where the data came from, how it was prepared, what quality checks ran, what biases were examined, and what was done about them.

Most organisations have pipelines. Article 10 requires governance. Those are not the same thing.

The data problem most organisations already have

The gap Article 10 exposes is not new. It already exists in most organisations' AI stacks, quietly undermining model reliability long before it becomes a regulatory problem:

If your metric definitions live in tribal knowledge, your AI outputs cannot be audited.
If your transformation logic is scattered across notebooks, BI tool filters, and ad-hoc SQL scripts, you cannot produce the lineage Article 10 requires.
If quality checks are one-time cleanups rather than continuous pipeline monitoring, you cannot prove the dataset was complete and free of errors at the time the model trained.

This is what we have called the context gap; the place where AI projects fail not because the model is wrong, but because the meaning of the data underneath it was never consistent or traceable. Article 10 turns an analytics problem into a legal one. The underlying issue is the same. The stakes are higher.

What 'AI-ready data' means for Article 10

This is not a checklist to hand to a vendor. It is a standard to hold your own data infrastructure to. Five requirements separate a dataset that satisfies Article 10 from one that does not:

Measurable, enforced quality: Not a one-time check. Continuous validation with explicit thresholds and named owners. Article 10 requires you to show the dataset was complete and free of errors. That requires a production monitoring system, not a cleanup sprint before a release.
Full data lineage: From source to model input, traceable in minutes. When an auditor asks where a training dataset came from, you need to answer in seconds, not weeks. Lineage reconstructed from memory after the fact does not satisfy the standard.
Governed, reusable definitions: If "active customer" means three different things across three systems, your training data is inconsistent by definition. Governed definitions, documented and shared across teams, are the foundation of a defensible dataset.
Access controls with audit trail: Article 10 requires knowing who accessed what and when. Field-level controls and a clear access log are not optional for high-risk AI. "We trust our team" is not an auditable answer.
Operational discipline: Pipelines with monitoring, clear ownership, and documented expectations. Article 10 requires you to show governance was active at the time of training, not reconstructed after the fact. A pipeline that ran without incident does not prove governance. A pipeline with documented checks, owners, and alerts does.

The industries most exposed

Financial services: Credit scoring and fraud detection AI systems are explicitly named in Annex III. If your organisation uses AI for credit decisions, anti-money laundering, or customer risk scoring, Article 10 applies to you. The data governance standard required is higher than most existing MLOps practices, and it needs to be auditable rather than operational.
Healthcare: AI systems used in clinical decision support, patient triage, or medical imaging fall within scope. The data governance requirements overlap with existing clinical data obligations, but being GDPR-compliant does not satisfy Article 10's documentation standard. They are parallel obligations, not interchangeable ones.
Manufacturing: AI used in safety-critical applications, including predictive maintenance on critical infrastructure and quality control systems that affect product safety, falls under Annex III. The supply chain and operational data these models train on needs to meet the same governance standard as any other high-risk AI dataset.

One thing to do today

The high-risk deadline is not tomorrow, but the foundation it requires takes months to build, not days. Organisations that wait for a confirmed date risk building under pressure. Organisations that treat it as a data infrastructure project, which is what it is, can be ready on any timeline.

Already know your direction and want to see where your data foundation stands? The AR² AI-readiness check takes eight questions and returns your readiness band and the three biggest gaps to close.

Take the AI-readiness check →

This post is for informational purposes and does not constitute legal advice. Organisations should seek qualified legal counsel for jurisdiction-specific compliance guidance.

The Ultimate Guide to EU AI Act Data-Readiness

1 min read

Solutions

Use Cases & Products

By Tech

Resources

Growth

Your Biggest EU AI Act Compliance Risk Isn't the Model. It's the Data Underneath It.

What Article 10 actually requires

The data problem most organisations already have

What 'AI-ready data' means for Article 10

The industries most exposed

One thing to do today

Is your data ready for AI in production?

The Ultimate Guide to EU AI Act Data-Readiness

The Role of Data Integration in Accurate Business Forecasting

When Departments Collide: The Cost of Fragmented Data Standards