Sorting Power Pivot Data On Load

Matt Allington suggests pre-sorting results to reduce load in Power Pivot:

Imagine you have 50,000 products in your data table and you have 50,000,000 rows of data.  Power Pivot will take the first 1 million rows it comes to (1 segment worth), work out how to sort and compress the columns, and then compress the data into a single segment before moving to the next 1 million rows it comes to (in the order they are loaded).  When it does this, it is highly likely that every product number will appear in every single segment – all 50 segments.  If we assume an equal number of product records for each product (unlikely but OK for this discussion), then there would be 1,000 records for each product spread throughout the entire data table,and each and every segment is likely to contain all 50,000 product IDs.  This is not good for compression.

This is an interesting result and not something I would have thought intuitive.

Related Posts

When Adding Indexes Hurts Performance

Jeffry Schwartz takes us through an odd case: Recently, a customer requested that we tune a query that took 13 seconds to return 11 rows.  SQL Server 2017 suggested an index to improve performance, so we added it in a development environment.  The improvement made the query run 647 seconds, almost 50 TIMES longer than the original!  This naturally caused […]

Read More

Power Query and the Benefits of Immutability

Chris Webb explains why immutable expressions can be faster to run multiple times than mutable processes: Instead of taking the value #”Sorted Rows”[Column2]{0} and storing it in the variable Column2 then adding Column2 four times, I’m  adding the expression #”Sorted Rows”[Column2]{0} together four times. The query returns the same number as the previous query. However […]

Read More

Categories

February 2016
MTWTFSS
« Jan Mar »
1234567
891011121314
15161718192021
22232425262728
29