Since Ralph Kimball's "The Data Warehouse Toolkit" was published in 1996, the star schema has been the gold standard for analytical data models. With the rise of modern cloud platforms, however, the snowflake schema — and even more radical shapes — are back on the agenda. What should the decision actually rest on?
Star Schema
A central fact table (e.g. sales_fact) surrounded by denormalised dimension tables (dim_customer, dim_product, dim_date). Each dimension lives in a single table; JOINs are kept to a minimum. The upside: simplicity, performance, and the natural shape BI tools expect.
sales_fact ←→ dim_customer ←→ dim_product ←→ dim_date ←→ dim_store
Snowflake Schema
Dimensions are normalised — dim_product → dim_category → dim_department and similar relationships. Upside: less storage, easier maintenance. Downside: more JOINs, more complex queries, heavier BI semantic modelling.
Modern Cloud Perspective
In the era of columnar storage (Snowflake, BigQuery, Synapse), the size of a denormalised table is no longer as important as it once was. A 1 TB dim_customer can compress down to 50 GB with columnar compression, which seriously weakens the "snowflake saves space" argument.
On the other hand, platforms like GCP BigQuery and AWS Athena still price JOINs aggressively; on those platforms star schema tends to be the more economical choice.
A Practical Guide
- BI-first DWH (Power BI, Tableau, Qlik): star schema — native integration, consistent performance.
- Data-science + analytics hybrid: star, but with small linked tables for the hierarchies inside dimensions (a light snowflake).
- Highly dynamic product catalogue (retail, e-commerce): a denormalised
dim_productincurs a performance penalty on daily rebuilds; either go incremental or snowflake. - Regulatory reports: snapshot facts plus time-dated SCD Type 2 dimensions (on top of either model).
Fields That Should Be Denormalised
- Date hierarchies (year/quarter/month/day): always denormalised in
dim_date. - Customer demographics: usually flattened into
dim_customer. - Product category paths: if there are four or more levels, snowflake may win.
The One Big Table (OBT) Approach
Modern ML pipelines and some self-service BI scenarios prefer a single wide table to a dimensional model. No JOINs, full context on a single row. The downside is steep maintenance cost; any change ripples through the whole table. Our recommendation: use OBT only for ML feature stores — for general BI, dimensional modelling still wins.
