Press "Enter" to skip to content

Category: Statistics

Understanding The Cardinality Estimator

SQL Scotsman is working on a very interesting series on statistics and the different cardinality estimators.  So far, this is a three-part series.  Part one is an overview:

A few of those assumptions changed in the new  SQL Server 2014/2016 CE, namely:

  • Independence becomes Correlation: In absence of existing multi-column statistics, the legacy CE views the distribution of data contained across different columns as uncorrelated with one another. This assumption of independence often does not reflect the reality of a typical SQL Server database schema, where implied correlations do actually exist. The new CE uses an increased correlation assumption for multiple predicates and an exponential back off algorithm to derive cardinality estimates.

  • Simple Join Containment becomes Base Join Containment: Under the legacy CE, the assumption is that non-join predicates are somehow correlated which is called “Simple Containment”. For the new Cardinality Estimator, these non-join predicates are assumed to be independent (called “Base Containment”), and so this can translate into a reduced row estimate for the join. At a high level, the new CE derives the join selectivity from base-table histograms without scaling down using the associated filter predicates. Instead the new CE computes join selectivity using base-table histograms before applying the selectivity of non-join filters.

Part two looks at trace flag 9481:

When To Use Trace Flag 9481

Query Scope:  You’ve moved (migrated/upgraded) to SQL Server 2014 / 2016, your databases are at compatibility level 120 / 130 and using the new CE, your workload is performing well overall but there are a few regressions where a small number of queries actually perform worse.  Use Trace Flag 9481 on a per query basis as a temporary measure until you can tune / rewrite the query so it performs well without the hint.

Part three discusses database scoped configurations in SQL Server 2016:

The problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level. This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

The article on statistics is quite long for a blog post and a great read.  I’m looking forward to reading more.

Comments closed

The Value Of Unused Indexes

Erik Darling provides a scenario in which an index which does not get used in an execution plan can nonetheless help query performance:

We can see an example of this with unique indexes and constraints, but another possibility is that the created index had better statistical information via the histogram. When you add an index, you get Fresh Hot Stats, whereas the index you were using could be many modifications behind current for various reasons. If you have a big table and don’t hit auto-update thresholds often, if you’re not manually updating statistics somehow, or if you’re running into ascending key weirdness. These are all sane potential reasons. One insane potential reason is if you have autocreate stats turned off, and the index you create is on a column that didn’t have a statistics object associated with it. But you’d see plan warnings about operators not having associated statistics.

Again, we’re going to focus on how ADDING an index your query doesn’t use can help. I found out the hard way that both unique indexes and constraints can cease being helpful to cardinality estimation when their statistics get out of date.

This is sort of like a triple bank shot solution:  even if it works that one time, there are easier ways to do it—and those ways are more likely to succeed to boot.

Comments closed

Check Those Estimates

Grant Fritchey runs into a statistics issue:

While the number of rows for 1048 was the lowest, at 3, unfortunately it seems that the 1048 values were added to the table after the statistics for the index had been updated. Instead of using something from the histogram, my value fell outside the values in the histogram. When the value is outside histogram the Cardinality Estimator uses the average value across the entire histogram, 258.181 (at least for any database that’s in SQL Server 2014 or greater and not running in a compatibility mode), as the row estimate.

Figuring out those boundaries can make the difference between a good plan and a bad plan.

Comments closed

Azure SQL DW Statistics

Emma Stewart looks at how statistics are created in Azure SQL Data Warehouse:

In Azure SQL Data Warehouse, statistics have to be created manually. On previous SQL Server projects, creating and maintaining statistics wasn’t something that we had to incorporate into our design (and really think about!) however with SQL DW we need to make sure we think about how to include it in our process in order to make sure we take advantage of the benefits of working with Azure DW.

The major selling point of Azure SQL Data Warehouse is that it is capable of processing huge volumes of data, one of the specific performance optimisations that has been made is the distributed query optimiser. Using the information obtained from the statistics (information on data size and distribution), the service is able to optimize queries by assessing the cost of specific distributed query operations. Therefore, since the query optimiser is cost-based, SQL DW will always choose the plan with the lowest cost.

Azure SQL Data Warehouse is a bit of a strange animal, with differences in statistics being one of the smaller changes versus “classic” SQL Server.

Comments closed

Change With Automatic Stats Update

Jack Li notes that SQL Server 2016 has changed when automatic statistics update gets called:

Old threshold: it takes 20% of row changes before auto update stats kicks (there are some tweaks for small tables, for large tables, 20% change is needed).  For a table with 100 million rows, it requires 20 million row change for auto stats to kick in. For vast majority of large tables, auto stats basically doesn’t do much.

New threshold: Starting SQL 2008 R2 SP1, we introduced a trace flag 2371 to control auto update statistics better (new threshold).  Under trace flag 2371, percentage of changes requires is dramatically reduced with large tables.  In other words, trace flag 2371 can cause more frequent update.  This new threshold is off by default and is enabled by the trace flag.  But in SQL 2016, this new threshold is enabled by default for a database with compatibility level 130.

Important to know.

Comments closed

DBCC SHOW_STATISTICS Update

Erik Darling notes that his Connect item to replace DBCC SHOW_STATISTICS has been marked as resolved:

So what does it look like?

I have no idea. I don’t know if it’s a DMV or a function, I don’t know what it’s called, and I don’t know what information it exposes. I also don’t know how it will get joined to other DMVs. There were no details offered up when the status changed. And I’m fine with that! I’m pretty psyched that it got enough traction to get a fix to begin with. If anyone from MS feels like shooting me an email with details, I won’t complain.

But since we don’t know, we’re free to speculate. Like all those History Channel shows about aliens and fake animals and where the Templars secretly buried Jesus’ gold teeth in Arizona. It’ll be fun!

It’ll be interesting to see the results.

Comments closed

Using Statistics For Index Design

Kendra Little argues that you should not use automatically created statistics as a guide for index creation:

We’ve talked a lot so far about how much statistics and indexes are related. This is why it seems like statistics might be useful for designing indexes!

But here’s the thing — SQL Server doesn’t track and report on how many times a statistic was used during optimization.

This is an interesting discussion.

Comments closed

Multi-Column Statistics

Raul Gonzalez looks at how the different cardinality estimators handle multi-column statistics:

The thing we can learn from this is that is impossible to be always right when you have to estimate the number of rows if your only resource is statistics, doesn’t matter single or multi-column, there is a set of values out there ready to defeat your logic.

However I think it’s a good idea that SQL Server 2016 gets back to look into multi-column for a simple reason, these are user created stats and therefore gives us (DBA’s, DEV’s) more power over how rows are estimated.

Multi-column stats are probably among the most under-utilized tools in SQL Server.

Comments closed

Index-Based Statistics Updates

Michael Bourgon has a script to get information on statistics updates for stats based off of indexes:

Quickie, based off an earlier post. (http://thebakingdba.blogspot.com/2012/02/tuning-statistics-when-were-they.html)

Get the last 4 stat updates for every statistic based on an index. The filter is on the auto_created; flip that to get all the system

This does use the DBCC SHOW_STATISTICS command, which reminds me of a rant (though not about Michael’s code; it’s about the need to use this DBCC command rather than having a nice DMV which returns all of the relevant information).

Comments closed

Duplicate Statistics

Shaun J. Stuart discusses removing duplicate statistics:

I puzzled on this for a bit and got sidetracked by the strange way SSMS displays statistics columns on the Property page. Then it got to be the end of the day and I went home. The next day, I had a comment on my previous post from Aaron Bertrand who mentioned there is a related bug with the stats_column_id column of the sys.stats_columns view. It does not contain what the MSDN documentation says it contains. The Connect item for this, along with a workaround, can be found here.

The script I was using did not reference that column, but it did get me thinking that perhaps the script was not correctly identifying the first column in an index.

Shaun has an updated version of a duplicate statistics checker script that you may want to check out.

Comments closed