Base Versus Simple Containment

Joe Obbish takes a crack at explaining the difference between base containment and simple containment for cardinality estimation:

We know that the first query will return 500k rows and the second query will return 0 rows. However, can SQL Server know that? Each statistics object only contains information about its own column. There’s no correlation between the UNIQUE_ID and MOD_FILTER columns, so there isn’t a way for SQL Server to know that the queries will return different estimates. The query optimizer can create an estimate based on the filters on the WHERE clause and on the histograms of the join columns, but there’s no foolproof way to do that calculation. The presence of the filters introduces uncertainty into the estimate, even with statistics that perfectly describe the data for each column. The containment assumption is all about the modeling assumption that SQL Server has to make to resolve that uncertainty.

It’s an interesting post aimed at trying to get you to think like a simplified cardinality estimator.  SQL Server doesn’t behave exactly like this, but it’s a good mental reference point.

Related Posts

Understanding Hash Match Aggregates

Itzik Ben-Gan continues his series on grouping and aggregating data by looking at the hash match aggregation process: The estimated CPU cost for the Hash Aggregate in the plan for Query 8 is 0.166344, and in Query 9 is 0.16903. It could be an interesting exercise to try and figure out exactly in what way […]

Read More

The Value Of Statistics In SQL Server

Monica Rathbun walks us through the benefits of having statistics on tables in SQL Server: Statistics are made up of three parts. Each part tells the optimizer important information regarding the make up the table’s data distribution. Header – Last Time Stats were updated and number of sample rows Density Vector – Uniqueness of the columns or […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930