Cardinality Estimation On COUNT(*)

Paul White digs into how the cardinality estimator works with COUNT aggregations containing HAVING clauses:

The approach SQL Server takes is to assume that each group is most likely to contain the overall mean (average) number of rows. This is simply the cardinality divided by the number of unique values. For example, for 1000 rows with 20 unique values, SQL Server would assume that (1000 / 20) = 50 rows per group is the most likely value.

Turning back to our original example, this means that the computed count column is “most likely” to contain a value around (19614 / 575) ~= 34.1113. Since density is the reciprocal of the number of unique values, we can also express that as cardinality * density = (19614 * 0.00173913), giving a very similar result.

Definitely worth a careful read.

Related Posts

Collecting Statistics Usage Info

Grant Fritchey shows us how (safely) to collect data on statistics usage: Years ago I was of the opinion that it wasn’t really possible to see the statistics used in the generation of a query plan. If you read the comments here, I was corrected of that notion. However, I’ve never been a fan of using […]

Read More

Understanding DBCC SHOW_STATISTICS Outputs

Bill Wolf continues his series on statistics by looking at what DBCC SHOW_STATISTICS gives you: When I was putting together the lesson plans for this, I wanted to make my own query for the comparisons, not borrow one from another site or blog.  Yes, I borrow plenty, but I wanted this to be mine.  When […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930