Cardinality Estimation On COUNT(*)

Paul White digs into how the cardinality estimator works with COUNT aggregations containing HAVING clauses:

The approach SQL Server takes is to assume that each group is most likely to contain the overall mean (average) number of rows. This is simply the cardinality divided by the number of unique values. For example, for 1000 rows with 20 unique values, SQL Server would assume that (1000 / 20) = 50 rows per group is the most likely value.

Turning back to our original example, this means that the computed count column is “most likely” to contain a value around (19614 / 575) ~= 34.1113. Since density is the reciprocal of the number of unique values, we can also express that as cardinality * density = (19614 * 0.00173913), giving a very similar result.

Definitely worth a careful read.

Related Posts

The Value Of Statistics In SQL Server

Monica Rathbun walks us through the benefits of having statistics on tables in SQL Server: Statistics are made up of three parts. Each part tells the optimizer important information regarding the make up the table’s data distribution. Header – Last Time Stats were updated and number of sample rows Density Vector – Uniqueness of the columns or […]

Read More

Update Statistics After An Upgrade

Erin Stellato gives us some good life advice: There are a variety of methods we use for helping customers upgrade to a new SQL Server version, and one question we often get asked is whether or not statistics need to be updated as part of the upgrade process. tl;dr Yes.  Update statistics after an upgrade.  Further, […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930