Press "Enter" to skip to content

Using Date Types In Warehouses

Koen Verbeeck argues that date keys in warehouses should be actual date types:

The worst are by far the string representation, as there is no actual check on the contents. It can literally contain everything. And is ’01/02/2018′ the first of February 2018 (like any sane person would read, because days come before months), or the 2nd of January? So if you have to store dates in your data warehouse, avoid strings at all costs. No excuses.

The integer representation – e.g. 20171208 – is really popular. If I recall Kimball correctly, he said it’s the one exception where you can use smart keys, aka surrogate keys that have a meaning embedded into them. I used them for quite some time, but I believe I have found a better alternative: using the actual date data type.

I bounce back and forth, but I’m sympathetic to Koen’s argument, which you can read by clicking through.