Combining Densities

Paul White explains how the SQL Server cardinality estimator will build an estimate involving multiple single-column statistics:

The task of estimating the number of rows produced by a GROUP BY clause is trivial when only a single column is involved (assuming no other predicates). For example, it is easy to see that GROUP BY Shelf will produce 21 rows; GROUP BY Bin will produce 62.

However, it is not immediately clear how SQL Server can estimate the number of distinct (Shelf, Bin) combinations for our GROUP BY Shelf, Bin query. To put the question in a slightly different way: Given 21 shelves and 62 bins, how many unique shelf and bin combinations will there be? Leaving aside physical aspects and other human knowledge of the problem domain, the answer could be anywhere from max(21, 62) = 62 to (21 * 62) = 1,302. Without more information, there is no obvious way to know where to pitch an estimate in that range.

Yet, for our example query, SQL Server estimates 744.312 rows (rounded to 744 in the Plan Explorer view) but on what basis?

Read on for debugger usage, Shannon entropy calculations, and all kinds of other fun stuff.

Related Posts

SET STATISTICS IO And Automated Statistics Updates

Niko Neugebauer shows us something new in SQL Server 2019: There has never been such information before!We are just writing into it!Why do we have those wonderful 1351498 logical reads ?Are they actually writes ? And if they would be, would not it be correct to display them as physical accesses ? The answer is […]

Read More

Creating Multi-Column Statistics From Missing Index DMVs

Max Vernon shows how you can use the missing index DMVs to find potential candidates for multi-column statistics: SQL Server does have a fairly useful dynamic management view, or DMV, which provides insight that can be leveraged in this area. The DMV I’m talking about is the set of DMVs around missing indexes, consisting of […]

Read More


August 2017
« Jul Sep »