The task of estimating the number of rows produced by a
GROUP BYclause is trivial when only a single column is involved (assuming no other predicates). For example, it is easy to see that
GROUP BY Shelfwill produce 21 rows;
GROUP BY Binwill produce 62.
However, it is not immediately clear how SQL Server can estimate the number of distinct
(Shelf, Bin)combinations for our
GROUP BY Shelf, Binquery. To put the question in a slightly different way: Given 21 shelves and 62 bins, how many unique shelf and bin combinations will there be? Leaving aside physical aspects and other human knowledge of the problem domain, the answer could be anywhere from max(21, 62) = 62 to (21 * 62) = 1,302. Without more information, there is no obvious way to know where to pitch an estimate in that range.
Yet, for our example query, SQL Server estimates 744.312 rows (rounded to 744 in the Plan Explorer view) but on what basis?
Read on for debugger usage, Shannon entropy calculations, and all kinds of other fun stuff.