Category: Statistics

The query optimiser hotfixes contained under Trace Flag 4199 are intentionally not enabled by default. This means when upgrading from SQL Server 2008 R2 to SQL Server 2012 for example, new query optimiser logic is not enabled. The reason behind this according to the article linked above is to prevent plan changes that could cause query performance regressions. This makes sense for highly optimised environments where application critical queries are tuned and rely on specific execution plans and any change in query optimiser logic could potentially cause unexpected / unwanted query regressions.

Read the whole thing.

Comments closed

Checking On Stats

Published 2016-12-07 by Kevin Feasel

Kendra Little has a script to get vital information regarding a table’s statistics:

For most modern versions of SQL Server, I like to join to sys.dm_db_stats_properties() — you can get a LOT of detail in a single query! (This works with SQL Server 2008 R2 SP2+ / SQL Server 2012 SP1+ / All higher versions)

Here’s the query, looking at a sample table in the WideWorldImporters database:

Click through for the script, as well as a version which works on 2005 and 2008.

Comments closed

Forcing The Legacy Cardinality Estimator

Published 2016-12-01 by Kevin Feasel

SQL Scotsman has three methods for using the legacy Cardinality Estimator in specific circumstances even after you’ve switched over to the new:

The problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level. In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level.This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

Each has its own specific use cases, so it’s good to know all three.

Comments closed

Understanding The Cardinality Estimator

Published 2016-11-16 by Kevin Feasel

SQL Scotsman is working on a very interesting series on statistics and the different cardinality estimators. So far, this is a three-part series. Part one is an overview:

A few of those assumptions changed in the new SQL Server 2014/2016 CE, namely:

Independence becomes Correlation: In absence of existing multi-column statistics, the legacy CE views the distribution of data contained across different columns as uncorrelated with one another. This assumption of independence often does not reflect the reality of a typical SQL Server database schema, where implied correlations do actually exist. The new CE uses an increased correlation assumption for multiple predicates and an exponential back off algorithm to derive cardinality estimates.
Simple Join Containment becomes Base Join Containment: Under the legacy CE, the assumption is that non-join predicates are somehow correlated which is called “Simple Containment”. For the new Cardinality Estimator, these non-join predicates are assumed to be independent (called “Base Containment”), and so this can translate into a reduced row estimate for the join. At a high level, the new CE derives the join selectivity from base-table histograms without scaling down using the associated filter predicates. Instead the new CE computes join selectivity using base-table histograms before applying the selectivity of non-join filters.

Part two looks at trace flag 9481:

When To Use Trace Flag 9481

Query Scope: You’ve moved (migrated/upgraded) to SQL Server 2014 / 2016, your databases are at compatibility level 120 / 130 and using the new CE, your workload is performing well overall but there are a few regressions where a small number of queries actually perform worse. Use Trace Flag 9481 on a per query basis as a temporary measure until you can tune / rewrite the query so it performs well without the hint.

Part three discusses database scoped configurations in SQL Server 2016:

The problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level. In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level. This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

The article on statistics is quite long for a blog post and a great read. I’m looking forward to reading more.

Comments closed

The Value Of Unused Indexes

Published 2016-11-10 by Kevin Feasel

Erik Darling provides a scenario in which an index which does not get used in an execution plan can nonetheless help query performance:

We can see an example of this with unique indexes and constraints, but another possibility is that the created index had better statistical information via the histogram. When you add an index, you get Fresh Hot Stats, whereas the index you were using could be many modifications behind current for various reasons. If you have a big table and don’t hit auto-update thresholds often, if you’re not manually updating statistics somehow, or if you’re running into ascending key weirdness. These are all sane potential reasons. One insane potential reason is if you have autocreate stats turned off, and the index you create is on a column that didn’t have a statistics object associated with it. But you’d see plan warnings about operators not having associated statistics.

Again, we’re going to focus on how ADDING an index your query doesn’t use can help. I found out the hard way that both unique indexes and constraints can cease being helpful to cardinality estimation when their statistics get out of date.

This is sort of like a triple bank shot solution: even if it works that one time, there are easier ways to do it—and those ways are more likely to succeed to boot.

Comments closed

Check Those Estimates

Published 2016-11-04 by Kevin Feasel

Grant Fritchey runs into a statistics issue:

While the number of rows for 1048 was the lowest, at 3, unfortunately it seems that the 1048 values were added to the table after the statistics for the index had been updated. Instead of using something from the histogram, my value fell outside the values in the histogram. When the value is outside histogram the Cardinality Estimator uses the average value across the entire histogram, 258.181 (at least for any database that’s in SQL Server 2014 or greater and not running in a compatibility mode), as the row estimate.

Figuring out those boundaries can make the difference between a good plan and a bad plan.

Comments closed

Azure SQL DW Statistics

Published 2016-11-02 by Kevin Feasel

Emma Stewart looks at how statistics are created in Azure SQL Data Warehouse:

In Azure SQL Data Warehouse, statistics have to be created manually. On previous SQL Server projects, creating and maintaining statistics wasn’t something that we had to incorporate into our design (and really think about!) however with SQL DW we need to make sure we think about how to include it in our process in order to make sure we take advantage of the benefits of working with Azure DW.

The major selling point of Azure SQL Data Warehouse is that it is capable of processing huge volumes of data, one of the specific performance optimisations that has been made is the distributed query optimiser. Using the information obtained from the statistics (information on data size and distribution), the service is able to optimize queries by assessing the cost of specific distributed query operations. Therefore, since the query optimiser is cost-based, SQL DW will always choose the plan with the lowest cost.

Azure SQL Data Warehouse is a bit of a strange animal, with differences in statistics being one of the smaller changes versus “classic” SQL Server.

Comments closed

Change With Automatic Stats Update

Published 2016-10-10 by Kevin Feasel

Jack Li notes that SQL Server 2016 has changed when automatic statistics update gets called:

Old threshold: it takes 20% of row changes before auto update stats kicks (there are some tweaks for small tables, for large tables, 20% change is needed). For a table with 100 million rows, it requires 20 million row change for auto stats to kick in. For vast majority of large tables, auto stats basically doesn’t do much.

New threshold: Starting SQL 2008 R2 SP1, we introduced a trace flag 2371 to control auto update statistics better (new threshold). Under trace flag 2371, percentage of changes requires is dramatically reduced with large tables. In other words, trace flag 2371 can cause more frequent update. This new threshold is off by default and is enabled by the trace flag. But in SQL 2016, this new threshold is enabled by default for a database with compatibility level 130.

Important to know.

Comments closed

DBCC SHOW_STATISTICS Update

Published 2016-10-07 by Kevin Feasel

Erik Darling notes that his Connect item to replace DBCC SHOW_STATISTICS has been marked as resolved:

So what does it look like?

I have no idea. I don’t know if it’s a DMV or a function, I don’t know what it’s called, and I don’t know what information it exposes. I also don’t know how it will get joined to other DMVs. There were no details offered up when the status changed. And I’m fine with that! I’m pretty psyched that it got enough traction to get a fix to begin with. If anyone from MS feels like shooting me an email with details, I won’t complain.

But since we don’t know, we’re free to speculate. Like all those History Channel shows about aliens and fake animals and where the Templars secretly buried Jesus’ gold teeth in Arizona. It’ll be fun!

It’ll be interesting to see the results.

Comments closed

Using Statistics For Index Design

Published 2016-10-07 by Kevin Feasel

Kendra Little argues that you should not use automatically created statistics as a guide for index creation:

We’ve talked a lot so far about how much statistics and indexes are related. This is why it seems like statistics might be useful for designing indexes!

But here’s the thing — SQL Server doesn’t track and report on how many times a statistic was used during optimization.

This is an interesting discussion.

Comments closed