Press "Enter" to skip to content

Category: Statistics

Parallel Stats Sampling

SQL Scotsman shows which statistics-building operations are parallel and which are single-threaded:

“Starting with SQL Server 2016, sampling of data to build statistics is done in parallel, when using compatibility level 130, to improve the performance of statistics collection. The query optimiser will use parallel sample statistics whenever a table size exceeds a certain threshold.”

As per the previous demos, prior to SQL Server 2016, the gathering of sampled statistics are serial, single-threaded operations.  Only statistic operations gathered using a full scan would qualify for parallelism.  Now in SQL Server 2016, all automatic and manual statistic operations qualify for parallelism.

He also has a neat trick for invalidating stats on a large table, so check out this article-length blog post.

Comments closed

Stats Histogram DMV

Erik Darling looks at a new DMV in vNext CTP 1.1:

It’s not exactly perfect

For instance, if you just let it loose without filters, you get a severe error. The same thing happens if you try to filter on one of the columns in the function, rather than a column in sys.stats, like this.

Very cool.  It’s one step closer to us removing our dependencies on DBCC SHOW_STATISTICS.

Comments closed

Synchronous Or Asynchronous Stats Updates

SQL Scotsman explains synchronous versus asynchronous stats updates:

With that said, it would seem that asynchronous statistics are better suited to OLTP environments and synchronous statistics are better suited to OLAP environments.  As synchronous statistics are the default though, people are reluctant to change this setting without good reason to.  I’ve worked exclusively in OLTP environments over the last few years and have never seen asynchronous statistics rolled out as the default.  I have personally been bitten by synchronous statistic updates on large tables causing query timeouts which I resolved by switching to asynchronous statistic updates.

This is an interesting, nuanced take on the issue.  My bias is toward asynchronous stats updates because I have been burned, but it’s interesting to read someone thinking through the implications of this seemingly simple choice.

Comments closed

Truncate Table And Stats

Kendra Little shows that TRUNCATE TABLE does not always reset stats:

You might expect to see that the statistic on Quantity had updated. I expected it, before I ran through this demo.

But SQL Server never actually had to load up the statistic on Quantity for the query above. So it didn’t bother to update the statistic. It didn’t need to, because it knows that the table is empty, and this doesn’t show up in our column or index specific statistics.

Check it out.

Comments closed

Query Optimizer Hotfixes

SQL Scotsman covers the query optimizer hotfixes which you can turn on with trace flag 4199:

The query optimiser hotfixes contained under Trace Flag 4199 are intentionally not enabled by default.  This means when upgrading from SQL Server 2008 R2 to SQL Server 2012 for example, new query optimiser logic is not enabled.   The reason behind this according to the article linked above is to prevent plan changes that could cause query performance regressions.  This makes sense for highly optimised environments where application critical queries are tuned and rely on specific execution plans and any change in query optimiser logic could potentially cause unexpected / unwanted query regressions.

Read the whole thing.

Comments closed

Checking On Stats

Kendra Little has a script to get vital information regarding a table’s statistics:

For most modern versions of SQL Server, I like to join to sys.dm_db_stats_properties() — you can get a LOT of detail in a single query! (This works with SQL Server 2008 R2 SP2+ / SQL Server 2012 SP1+ / All higher versions)

Here’s the query, looking at a sample table in the WideWorldImporters database:

Click through for the script, as well as a version which works on 2005 and 2008.

Comments closed

Forcing The Legacy Cardinality Estimator

SQL Scotsman has three methods for using the legacy Cardinality Estimator in specific circumstances even after you’ve switched over to the new:

The problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level.This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

Each has its own specific use cases, so it’s good to know all three.

Comments closed

Understanding The Cardinality Estimator

SQL Scotsman is working on a very interesting series on statistics and the different cardinality estimators.  So far, this is a three-part series.  Part one is an overview:

A few of those assumptions changed in the new  SQL Server 2014/2016 CE, namely:

  • Independence becomes Correlation: In absence of existing multi-column statistics, the legacy CE views the distribution of data contained across different columns as uncorrelated with one another. This assumption of independence often does not reflect the reality of a typical SQL Server database schema, where implied correlations do actually exist. The new CE uses an increased correlation assumption for multiple predicates and an exponential back off algorithm to derive cardinality estimates.

  • Simple Join Containment becomes Base Join Containment: Under the legacy CE, the assumption is that non-join predicates are somehow correlated which is called “Simple Containment”. For the new Cardinality Estimator, these non-join predicates are assumed to be independent (called “Base Containment”), and so this can translate into a reduced row estimate for the join. At a high level, the new CE derives the join selectivity from base-table histograms without scaling down using the associated filter predicates. Instead the new CE computes join selectivity using base-table histograms before applying the selectivity of non-join filters.

Part two looks at trace flag 9481:

When To Use Trace Flag 9481

Query Scope:  You’ve moved (migrated/upgraded) to SQL Server 2014 / 2016, your databases are at compatibility level 120 / 130 and using the new CE, your workload is performing well overall but there are a few regressions where a small number of queries actually perform worse.  Use Trace Flag 9481 on a per query basis as a temporary measure until you can tune / rewrite the query so it performs well without the hint.

Part three discusses database scoped configurations in SQL Server 2016:

The problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level. This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

The article on statistics is quite long for a blog post and a great read.  I’m looking forward to reading more.

Comments closed

The Value Of Unused Indexes

Erik Darling provides a scenario in which an index which does not get used in an execution plan can nonetheless help query performance:

We can see an example of this with unique indexes and constraints, but another possibility is that the created index had better statistical information via the histogram. When you add an index, you get Fresh Hot Stats, whereas the index you were using could be many modifications behind current for various reasons. If you have a big table and don’t hit auto-update thresholds often, if you’re not manually updating statistics somehow, or if you’re running into ascending key weirdness. These are all sane potential reasons. One insane potential reason is if you have autocreate stats turned off, and the index you create is on a column that didn’t have a statistics object associated with it. But you’d see plan warnings about operators not having associated statistics.

Again, we’re going to focus on how ADDING an index your query doesn’t use can help. I found out the hard way that both unique indexes and constraints can cease being helpful to cardinality estimation when their statistics get out of date.

This is sort of like a triple bank shot solution:  even if it works that one time, there are easier ways to do it—and those ways are more likely to succeed to boot.

Comments closed