We know that the first query will return 500k rows and the second query will return 0 rows. However, can SQL Server know that? Each statistics object only contains information about its own column. There’s no correlation between the
UNIQUE_ID
andMOD_FILTER
columns, so there isn’t a way for SQL Server to know that the queries will return different estimates. The query optimizer can create an estimate based on the filters on theWHERE
clause and on the histograms of the join columns, but there’s no foolproof way to do that calculation. The presence of the filters introduces uncertainty into the estimate, even with statistics that perfectly describe the data for each column. The containment assumption is all about the modeling assumption that SQL Server has to make to resolve that uncertainty.
It’s an interesting post aimed at trying to get you to think like a simplified cardinality estimator. SQL Server doesn’t behave exactly like this, but it’s a good mental reference point.
Comments closed