When Stream Aggregates Require Sorting

Itzik Ben-Gan continues his series on grouping and aggregation:

Let’s try and figure out the costing formula for the Sort operator. Remember, our focus is the estimated cost and scaling because our ultimate goal is to figure out optimization thresholds where the optimizer changes its choices from one strategy to another.

The I/O cost estimate seems to be fixed: 0.0112613. I get the same I/O cost irrespective of factors like number of rows, number of sort columns, data type, and so on. This is probably to account for some anticipated I/O work.

As for the CPU cost, alas, Microsoft doesn’t publicly expose the exact algorithms that they use for sorting. However, among the common algorithms used for sorting by database engines in general are different implementations of merge sort and quicksort. Thanks to efforts made by Paul White, who’s fond of looking at Windows debugger stack traces (not all of us have the stomach for this), we have a bit more insight into the topic, published in his series “Internals of the Seven SQL Server Sorts.” According to Paul’s findings, the general sort class (used in the above plan) uses merge sort (first internal, then transitioning to external). In average, this algorithm requires n log n comparisons to sort n items. With this in mind, it’s probably a safe bet as a starting point to assume that the CPU part of the operator’s cost is based on a formula such as:

Operator CPU cost = <startup cost> + @numrows * LOG(@numrows) * <comparison cost>

Of course, this could be an oversimplification of the actual costing formula that Microsoft uses, but absent any documentation on the matter, this is an initial best guess.

It’s interesting to see how the calculation changes in form with larger numbers of rows.

Related Posts

Getting An Accurate Query Execution Time

Grant Fritchey shares some tips on accurate query time estimation: Before we get into all the choices and compare them, let’s baseline on methodology and a query to use. Not sure why, but many people give me blow back when I say “on average, this query runs in X amount of time.” The feedback goes […]

Read More

Ways To Check For Non-Existence

Brent Ozar shows two methods for finding records missing associated child records: You’re writing a query, and you wanna check to see if rows exist in a table. I’m using the free Stack Overflow database, and I wanna find all of the users who have not left a comment. The tables involved are: In dbo.Users, the Id field […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031