When Stream Aggregates Require Sorting

Itzik Ben-Gan continues his series on grouping and aggregation:

Let’s try and figure out the costing formula for the Sort operator. Remember, our focus is the estimated cost and scaling because our ultimate goal is to figure out optimization thresholds where the optimizer changes its choices from one strategy to another.

The I/O cost estimate seems to be fixed: 0.0112613. I get the same I/O cost irrespective of factors like number of rows, number of sort columns, data type, and so on. This is probably to account for some anticipated I/O work.

As for the CPU cost, alas, Microsoft doesn’t publicly expose the exact algorithms that they use for sorting. However, among the common algorithms used for sorting by database engines in general are different implementations of merge sort and quicksort. Thanks to efforts made by Paul White, who’s fond of looking at Windows debugger stack traces (not all of us have the stomach for this), we have a bit more insight into the topic, published in his series “Internals of the Seven SQL Server Sorts.” According to Paul’s findings, the general sort class (used in the above plan) uses merge sort (first internal, then transitioning to external). In average, this algorithm requires n log n comparisons to sort n items. With this in mind, it’s probably a safe bet as a starting point to assume that the CPU part of the operator’s cost is based on a formula such as:

Operator CPU cost = <startup cost> + @numrows * LOG(@numrows) * <comparison cost>

Of course, this could be an oversimplification of the actual costing formula that Microsoft uses, but absent any documentation on the matter, this is an initial best guess.

It’s interesting to see how the calculation changes in form with larger numbers of rows.

Related Posts

Hiding Work: The Nested Loop Operator

Erik Darling explains that the nested loop operator is like a duck: there’s more going on beneath the surface than it lets on: I’m going to talk about my favorite example, because it can cause a lot of confusion, and can hide a lot of the work it’s doing behind what appears to be a […]

Read More

Stored Procedure IF Branching and Performance

Erik Darling explains that the IF block in a stored procedure won’t help you with performance: Making plan choices with IF branches like this plain doesn’t work.The optimizer compiles a plan for both branches based on the initial compile value.What you end up with is a stored proc that doesn’t do what it’s supposed to […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031