Stats Terminology

Erik Darling fills in gaps on statistics terminology in his unique style:

SELECTIVITY

This tells you how special your snowflakes are. When a column is called “highly selective” that usually means values aren’t repeating all that often, if at all. Think about order numbers, identity or sequence values, GUIDs, etc.

DENSITY

This is sort of the anti-matter to selectivity. Highly dense columns aren’t very unique. They’ll return a lot of rows for a given value. Think about Zip Codes, Gender, Marital Status, etc. If you were to select all the people in 10002, a densely (there’s that word again) populated zip code in Chinatown, you’d probably wait a while, kill the query, and add another filter.

Combine that with Kendra Little’s statistics FAQ for additional learning.

Related Posts

Cloning And Columnstore Statistics

Niko Neugebauer points out a fix in SQL Server 2019: I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates. This […]

Read More

The Value Of Auto-Created Statistics

Brent Ozar is here to praise statistics auto-creation: Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you. I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for […]

Read More

Categories

April 2016
MTWTFSS
« Mar May »
 123
45678910
11121314151617
18192021222324
252627282930