Press "Enter" to skip to content

Things Not to Include in Data Warehouses

Erik Darling compiles a list:

This is a list of things I see in data warehouses that make me physically ill:

– Unique constraints of any kind: Primary Keys, Indexes, etc. Make things unique during your staging process. Don’t make your indexes do that work.

Read on for the full list. I agree with everything except clustered row-store indexes. Those make a lot of sense on dimension tables, tied to the Kimball-style surrogate keys you create in the warehouse itself.

The other part I disagree with is non-clustered columnstore indexes, which I’ve rarely found good use for. Clustrered columnstore indexes are outstanding but the non-clustered variety…meh at best. This answer comes primarily because the pattern I tend to use for warehouse queries is to drive from the fact table, aggregate as much as I can there, and connect to the dimensions for further information at the end. If your warehouse access patterns differ radically from this, you might get more out of non-clustered columnstore indexes. Maybe.