Sorting Power Pivot Data On Load

Matt Allington suggests pre-sorting results to reduce load in Power Pivot:

Imagine you have 50,000 products in your data table and you have 50,000,000 rows of data.  Power Pivot will take the first 1 million rows it comes to (1 segment worth), work out how to sort and compress the columns, and then compress the data into a single segment before moving to the next 1 million rows it comes to (in the order they are loaded).  When it does this, it is highly likely that every product number will appear in every single segment – all 50 segments.  If we assume an equal number of product records for each product (unlikely but OK for this discussion), then there would be 1,000 records for each product spread throughout the entire data table,and each and every segment is likely to contain all 50,000 product IDs.  This is not good for compression.

This is an interesting result and not something I would have thought intuitive.

Related Posts

Character Columns And MAX Vs TOP+ORDER Differences

Kendra Little digs into a tricky performance problem: Most of the time in SQL Server, the MAX() function and a TOP(1) ORDER BY DESC will behave very similarly. If you give them a rowstore index leading on the column in question, they’re generally smart enough to go to the correct end of the index, and […]

Read More

Row Goals And Anti-Joins

Paul White continues his row goals series: The optimizer assumes that people write a semi join (indirectly e.g. using EXISTS) with the expectation that the row being searched for will be found. An apply semi join row goal is set by the optimizer to help find that expected matching row quickly. For anti join (expressed e.g. using NOT EXISTS) the optimizer’s assumption is that […]

Read More


February 2016
« Jan Mar »