When Stream Aggregates Require Sorting

Itzik Ben-Gan continues his series on grouping and aggregation:

Let’s try and figure out the costing formula for the Sort operator. Remember, our focus is the estimated cost and scaling because our ultimate goal is to figure out optimization thresholds where the optimizer changes its choices from one strategy to another.

The I/O cost estimate seems to be fixed: 0.0112613. I get the same I/O cost irrespective of factors like number of rows, number of sort columns, data type, and so on. This is probably to account for some anticipated I/O work.

As for the CPU cost, alas, Microsoft doesn’t publicly expose the exact algorithms that they use for sorting. However, among the common algorithms used for sorting by database engines in general are different implementations of merge sort and quicksort. Thanks to efforts made by Paul White, who’s fond of looking at Windows debugger stack traces (not all of us have the stomach for this), we have a bit more insight into the topic, published in his series “Internals of the Seven SQL Server Sorts.” According to Paul’s findings, the general sort class (used in the above plan) uses merge sort (first internal, then transitioning to external). In average, this algorithm requires n log n comparisons to sort n items. With this in mind, it’s probably a safe bet as a starting point to assume that the CPU part of the operator’s cost is based on a formula such as:

Operator CPU cost = <startup cost> + @numrows * LOG(@numrows) * <comparison cost>

Of course, this could be an oversimplification of the actual costing formula that Microsoft uses, but absent any documentation on the matter, this is an initial best guess.

It’s interesting to see how the calculation changes in form with larger numbers of rows.

Related Posts

Minor Differences Between Plan Cache And Query Store Plans

Grant Fritchey shows us some minor differences between what the Query Store shows for a particular execution plan versus what exists in the plan cache: As you can see, while the structure of the plans are identical, not everything is. The Compile values are different (although sometimes, they’ll be the same, that one is kind […]

Read More

Deep Dive On Index Spools

Hugo Kornelis takes a look at index spools: The Index Spool operator is one of the four spool operators that SQL Server supports. It retains a copy of all data it reads in an indexed worktable (in tempdb), and can then later return subsets of these rows without having to call its child operators to produce them […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031