Technical Deep Dive

Star Schema or Snowflake? The Dimensional Modelling Decision

Kimball's classic debate is still meaningful today: star versus snowflake schema and what actually changes in the cloud era.

BIART Ekibi2 min read14 views
Veri modeli diyagramı

Since Ralph Kimball's "The Data Warehouse Toolkit" was published in 1996, the star schema has been the gold standard for analytical data models. With the rise of modern cloud platforms, however, the snowflake schema — and even more radical shapes — are back on the agenda. What should the decision actually rest on?

Star Schema

A central fact table (e.g. sales_fact) surrounded by denormalised dimension tables (dim_customer, dim_product, dim_date). Each dimension lives in a single table; JOINs are kept to a minimum. The upside: simplicity, performance, and the natural shape BI tools expect.

sales_fact ←→ dim_customer ←→ dim_product ←→ dim_date ←→ dim_store

Snowflake Schema

Dimensions are normalised — dim_productdim_categorydim_department and similar relationships. Upside: less storage, easier maintenance. Downside: more JOINs, more complex queries, heavier BI semantic modelling.

Modern Cloud Perspective

In the era of columnar storage (Snowflake, BigQuery, Synapse), the size of a denormalised table is no longer as important as it once was. A 1 TB dim_customer can compress down to 50 GB with columnar compression, which seriously weakens the "snowflake saves space" argument.

On the other hand, platforms like GCP BigQuery and AWS Athena still price JOINs aggressively; on those platforms star schema tends to be the more economical choice.

A Practical Guide

  • BI-first DWH (Power BI, Tableau, Qlik): star schema — native integration, consistent performance.
  • Data-science + analytics hybrid: star, but with small linked tables for the hierarchies inside dimensions (a light snowflake).
  • Highly dynamic product catalogue (retail, e-commerce): a denormalised dim_product incurs a performance penalty on daily rebuilds; either go incremental or snowflake.
  • Regulatory reports: snapshot facts plus time-dated SCD Type 2 dimensions (on top of either model).

Fields That Should Be Denormalised

  • Date hierarchies (year/quarter/month/day): always denormalised in dim_date.
  • Customer demographics: usually flattened into dim_customer.
  • Product category paths: if there are four or more levels, snowflake may win.

The One Big Table (OBT) Approach

Modern ML pipelines and some self-service BI scenarios prefer a single wide table to a dimensional model. No JOINs, full context on a single row. The downside is steep maintenance cost; any change ripples through the whole table. Our recommendation: use OBT only for ML feature stores — for general BI, dimensional modelling still wins.

Takeaway

Share