dbt (data build tool) brought software-engineering discipline to SQL-based data transformations — version control, modularity, automated tests, documentation, all sitting next to your SQL. Between 2021 and 2024 dbt became a global standard, and over the last eighteen months it has been spreading rapidly across Türkiye. So what should you actually do, and what should you avoid?
1. Staging → Intermediate → Marts Layering
The single most important discipline in a dbt project is three-layer organisation:
models/staging/: lightly cleaned source data (renames, type casts, null handling).models/intermediate/: business logic — joins, calculations, transformations.models/marts/: consumer-ready fact and dimension tables.
Once this separation is violated, it becomes impossible to tell where a change really originates.
2. Use ref and source
Models should never be linked with raw SQL joins; use {{ ref('other_model') }} instead. This gives dbt the dependency graph it needs to resolve build order correctly. Source tables should always be referenced via {{ source('schema', 'table') }}.
3. Test Discipline
Every model should carry at least three classes of test:
uniqueandnot_nullon the primary key.accepted_valueson enum columns.relationshipson foreign keys.
Custom tests live as SQL files under tests/. If tests fail in CI, the deploy should stop — no exceptions.
4. Incremental Models
Large fact tables should not full-refresh on every run. Use materialized='incremental' with unique_key and the is_incremental() macro to process only new rows.
5. Exposures and Documentation
exposures.yml declares the downstream consumers of your dbt project (BI reports, ML models). That way nobody deletes a model without seeing who depends on it. Combined with docs/ and schema.yml descriptions, dbt docs generate produces an automatic data dictionary.
6. Semantic Layer / Metrics
dbt Metrics (or the dbt Semantic Layer) lets you define business metrics such as "active customers" once and consume them consistently from Power BI, Looker or any API.
7. Macros and the Package Ecosystem
Community packages like dbt-utils and dbt-expectations provide ready-made macros and tests. Evaluate them before reinventing the wheel.
8. Performance: Materialization Strategy
Every model can be a view, table, incremental or ephemeral. Small transformation steps should be ephemeral (inlined as CTEs), final marts should be tables, and large facts should be incremental. The wrong choice costs you either performance or money.
9. CI/CD
Slim CI — running only changed models and their downstreams — turns thirty-minute pipelines into three-minute ones. dbt build --select state:modified+ --defer is the pattern.
Anti-Patterns to Avoid
- Business logic in the staging layer.
- A single mart model that grows to 500+ lines of SQL.
- Missing tests that surface as duplicates in production.
insertinstead ofmergefor late-arriving data in incremental models.
