Junk Dimensions – Curated SQL

Jesse Seymour talks about junk dimensions in warehousing:

I think one of the single biggest challenges I face as I attempt to warehouse data that originates as a SharePoint list is the handful of miscellaneous descriptive fields, such as approval status, request status, or something similar. Typically, this fields are setup as Choice fields in the SharePoint list so they have a known range of values, but its still a pain to have to build a dimension for each one.

Enter the junk dimension. Ever since I learned about this concept it has made my life so much easier. What the junk dimension does is perform a cross join against the different fields and creates a row for every possible combination of fields.

Junk dimensions are nice for those low-cardinality attributes which are important but don’t really fit anywhere else. The important thing to remember about a junk dimension is that you don’t want it to be too large: if you have 5 attributes, each of which has 8 possible values, you have 8^5 (32,768) rows. That’s not so bad, but make it 10 attributes and now your table has 1,073,741,824 rows, and that’s a lot of rows for a single dimension. If you find yourself in that scenario, you might want to create two junk dimensions (bringing you back to 2 dimensions with 32K rows), review your design to see if all those attributes are necessary, or review your design to see if your “junk” dimension is hiding a real dimension.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29