Cardinality Estimation On COUNT(*)

Paul White digs into how the cardinality estimator works with COUNT aggregations containing HAVING clauses:

The approach SQL Server takes is to assume that each group is most likely to contain the overall mean (average) number of rows. This is simply the cardinality divided by the number of unique values. For example, for 1000 rows with 20 unique values, SQL Server would assume that (1000 / 20) = 50 rows per group is the most likely value.

Turning back to our original example, this means that the computed count column is “most likely” to contain a value around (19614 / 575) ~= 34.1113. Since density is the reciprocal of the number of unique values, we can also express that as cardinality * density = (19614 * 0.00173913), giving a very similar result.

Definitely worth a careful read.

Related Posts

The Costs Of Statistics Updates With FULLSCAN

Kendra Little explains what happens when you update a table’s statistics with FULLSCAN: On my test instance, the command that uses the default sampling takes 6 seconds to complete. The command which adds “WITH FULLSCAN” takes just over five minutes to complete. The reason is that those two little words can add a whole lot of […]

Read More

NULL Values In The Histogram

Taiob Ali explains how NULL values show up in the SQL Server histogram when you create statistics: In the density_vector section ‘All density’ value for column ‘PickingCompletedWhen’ is 0.0004705882 which was calculated from: 1/(Number of distinct values of column ‘PickingCompletedWhen’). In this case which is 1/2125. All NULL values were considered as one. If you do […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930