Junk Dimensions

Jesse Seymour talks about junk dimensions in warehousing:

I think one of the single biggest challenges I face as I attempt to warehouse data that originates as a SharePoint list is the handful of miscellaneous descriptive fields, such as approval status, request status, or something similar.  Typically, this fields are setup as Choice fields in the SharePoint list so they have a known range of values, but its still a pain to have to build a dimension for each one.

Enter the junk dimension.  Ever since I learned about this concept it has made my life so much easier.  What the junk dimension does is perform a cross join against the different fields and creates a row for every possible combination of fields.

Junk dimensions are nice for those low-cardinality attributes which are important but don’t really fit anywhere else.  The important thing to remember about a junk dimension is that you don’t want it to be too large:  if you have 5 attributes, each of which has 8 possible values, you have 8^5 (32,768) rows.  That’s not so bad, but make it 10 attributes and now your table has 1,073,741,824 rows, and that’s a lot of rows for a single dimension.  If you find yourself in that scenario, you might want to create two junk dimensions (bringing you back to 2 dimensions with 32K rows), review your design to see if all those attributes are necessary, or review your design to see if your “junk” dimension is hiding a real dimension.

Related Posts

Column-Level Security In Azure SQL Data Warehouse

Kavitha Jonnakuti announces a new feature for Azure SQL Data Warehouse: Access to the table columns can be controlled based on the user’s execution context or their group membership with the standard GRANT T-SQL statement. To secure your data, you simply define a security policy via the GRANT statement to your table columns. For example, if you […]

Read More

Auditing Options With Azure SQL Data Warehouse

Janusz Rokicki explores what is available in Azure SQL Data Warehouse when it comes to auditing: Auditing is disabled by default and the UI experience depends on the region to which the logical server is deployed. For instance, in UK South, the portal offers no options to manage auditing: In North Europe, the portal allows […]

Read More

Categories

February 2016
MTWTFSS
« Jan Mar »
1234567
891011121314
15161718192021
22232425262728
29