What happens if the *first_name* field of your *Customer* table starts arriving NULL this afternoon? A marketing campaign goes out to 12,000 people starting with "Dear ,". Most of these breakages come from a silent expectation between producer and consumer being silently broken. Data contracts turn that silent expectation into a written one.
What is a data contract?
An SLA, signed for data, between a producer system (e.g. CRM) and consumers (DWH, ML model, dashboard). Schema, freshness, quality thresholds, semantic meaning, ownership, and change policy live together in a single YAML/JSON document.
Example (abbreviated):
yaml contract: customer.v3 owner: crm-team schema: customer_id: string, not_null, unique first_name: string, not_null, max_len=100 email: string, format=email, nullable=true freshness: max_lag=15m quality: uniqueness_customer_id: 100% null_rate_first_name: <1% breaking_change_policy: 30d_notice
How the silent expectation breaks
Real scenarios:
- The CRM team merges middle name into first_name; length grows from 100 to 200; DWH ETL does not error, but BI text wraps oddly.
- A lookup table's
statuscode stays in 1-5 for years and an update adds 6; an ML model misclassifies a segment because it has never seen 6. - A hotfix flips
order_idfrom int to string; downstream payment reconciliation silently misaligns.
Each turns into a multi-hour incident. Data contracts catch this at deploy time.
The five components of a contract
- Schema guarantees: names, types, nullability, format. The CDC pipeline rides on top of a schema registry.
- Freshness: the latest acceptable lag.
- Quality thresholds: uniqueness, null rate, value range, regex match.
- Semantic definition: what is
first_name? Which system is authoritative? Linked to the data dictionary. - Change policy: notice period for breaking changes (typically 30 days), versioning strategy.
Enforcement in production
The contract lives as a static file in the repo. Three runtime points:
- CI gate: a producer change conflicting with the contract blocks the PR.
- Pipeline test: every ETL run measures contract metrics with Soda Core / dbt-tests / custom asserts; violations alert.
- Catalog UI: active contracts are visible on the table in the data catalog so consumers see a yes/no answer to "can I trust this table?"
Tying it to an SLA
A contract violation auto-opens an incident attributed to the producer team's SLA. From this point on the contract is no longer paper — it is a minus on the producer's monthly score. That single enforcement is the only reliable way to keep contracts alive.
Versioning
Contracts are versioned customer.v1, customer.v2, etc. When a breaking change is approved as v3:
- v2 and v3 publish in parallel for 60 days.
- Consumers move to v3 with telemetry tracking the migration.
- v2 is retired once everyone is on v3.
This is natural on schema-registry stream platforms (Kafka + Avro/Protobuf); the same outcome is achievable in the batch world with dbt models + discipline.
CentraQL and data contracts
The CentraQL DataQuality module reads the contract file, generates rules automatically, and weights them into the Trust Score. A table's Trust Score moves with the health of its contract; the CFO dashboard shows lines like "Customer table Trust 87 — 2 freshness violations on contract.v3 in the last 7 days."
Where to start
A pragmatic 3-month plan:
- Weeks 1-2: collect the consumers of your five most critical tables; list the silent expectations.
- Weeks 3-6: write the first five contracts; producer and consumer sign.
- Weeks 7-10: CI gate + Soda Core integration; first violation alerts.
- Weeks 11-12: monthly review forum; expand beyond the first five.
Twelve weeks in, the organisation typically sees ~80% fewer surprise breakages and detection time on the rest dropping from hours to minutes.
Conclusion
Data contracts are not magic; they turn the silent agreement between producer and consumer into a written, measurable, auditable document. Pipeline reliability rises with the active enforcement of that contract; the conversation moves from "good or bad" to "contract.v3 freshness violations: 2 / 7 days." Trust Score, MDM and the governance framework only become a coherent whole when they all stand on top of contracts.
