“Our data is dirty” is heard in every CDO meeting. The problem is that “dirty” is an abstract verdict. To actually manage data quality you have to pass through a six-dimension framework with measurable metrics: accuracy, completeness, consistency, timeliness, uniqueness, validity. Each has a concrete formula and an operational form you can audit every day.
Six dimensions, six metrics
- Accuracy: records match the real world. Metric: percentage of records that match a verifiable source (e.g. national ID service, IBAN validator).
- Completeness: required fields are populated. Metric: not-null rate plus business-conditional completeness (“corporate customer must have a tax number”).
- Consistency: the same entity looks the same across systems. Metric: percentage of mismatched rows across source systems (e.g. address differences between CRM and core banking).
- Timeliness: data is available within the expected freshness. Metric: source-to-analytics lag in minutes (p95) against the SLA.
- Uniqueness: an entity is not duplicated. Metric: deterministic (key) and probabilistic (entity resolution) duplication rate.
- Validity: values conform to type/format/range rules. Metric: count of schema/regex/range violations.
Automation tooling
dbt’s native tests (unique, not_null, accepted_values, relationships) open the first door; complex business rules go into custom singular tests. Great Expectations or Soda Core are ideal for flows independent of dbt (for example, before raw data lands in Snowflake). dbt + Soda lets you place checks at every point of the transformation pipeline.
The Data Contract approach
A paradigm that matured in 2026: a signed agreement between the data producer and consumer. The producer commits via a testable contract not to break consumers when the schema changes. Open-source implementations have matured; Schemata and the Datacontract.com templates lead in practical adoption.
Production SLA
Putting numbers on a dashboard is not enough. Each dimension needs the triplet threshold + alarm + owner: if accuracy drops below 95%, which team responds within what time, who escalates. The SLO mindset has reached data teams too; reliability engineering is now a real discipline on the data side.
