Press "Enter" to skip to content

Category: Statistics

Azure SQL DW Statistics

Emma Stewart looks at how statistics are created in Azure SQL Data Warehouse:

In Azure SQL Data Warehouse, statistics have to be created manually. On previous SQL Server projects, creating and maintaining statistics wasn’t something that we had to incorporate into our design (and really think about!) however with SQL DW we need to make sure we think about how to include it in our process in order to make sure we take advantage of the benefits of working with Azure DW.

The major selling point of Azure SQL Data Warehouse is that it is capable of processing huge volumes of data, one of the specific performance optimisations that has been made is the distributed query optimiser. Using the information obtained from the statistics (information on data size and distribution), the service is able to optimize queries by assessing the cost of specific distributed query operations. Therefore, since the query optimiser is cost-based, SQL DW will always choose the plan with the lowest cost.

Azure SQL Data Warehouse is a bit of a strange animal, with differences in statistics being one of the smaller changes versus “classic” SQL Server.

Comments closed

Change With Automatic Stats Update

Jack Li notes that SQL Server 2016 has changed when automatic statistics update gets called:

Old threshold: it takes 20% of row changes before auto update stats kicks (there are some tweaks for small tables, for large tables, 20% change is needed).  For a table with 100 million rows, it requires 20 million row change for auto stats to kick in. For vast majority of large tables, auto stats basically doesn’t do much.

New threshold: Starting SQL 2008 R2 SP1, we introduced a trace flag 2371 to control auto update statistics better (new threshold).  Under trace flag 2371, percentage of changes requires is dramatically reduced with large tables.  In other words, trace flag 2371 can cause more frequent update.  This new threshold is off by default and is enabled by the trace flag.  But in SQL 2016, this new threshold is enabled by default for a database with compatibility level 130.

Important to know.

Comments closed

DBCC SHOW_STATISTICS Update

Erik Darling notes that his Connect item to replace DBCC SHOW_STATISTICS has been marked as resolved:

So what does it look like?

I have no idea. I don’t know if it’s a DMV or a function, I don’t know what it’s called, and I don’t know what information it exposes. I also don’t know how it will get joined to other DMVs. There were no details offered up when the status changed. And I’m fine with that! I’m pretty psyched that it got enough traction to get a fix to begin with. If anyone from MS feels like shooting me an email with details, I won’t complain.

But since we don’t know, we’re free to speculate. Like all those History Channel shows about aliens and fake animals and where the Templars secretly buried Jesus’ gold teeth in Arizona. It’ll be fun!

It’ll be interesting to see the results.

Comments closed

Using Statistics For Index Design

Kendra Little argues that you should not use automatically created statistics as a guide for index creation:

We’ve talked a lot so far about how much statistics and indexes are related. This is why it seems like statistics might be useful for designing indexes!

But here’s the thing — SQL Server doesn’t track and report on how many times a statistic was used during optimization.

This is an interesting discussion.

Comments closed

Multi-Column Statistics

Raul Gonzalez looks at how the different cardinality estimators handle multi-column statistics:

The thing we can learn from this is that is impossible to be always right when you have to estimate the number of rows if your only resource is statistics, doesn’t matter single or multi-column, there is a set of values out there ready to defeat your logic.

However I think it’s a good idea that SQL Server 2016 gets back to look into multi-column for a simple reason, these are user created stats and therefore gives us (DBA’s, DEV’s) more power over how rows are estimated.

Multi-column stats are probably among the most under-utilized tools in SQL Server.

Comments closed

Index-Based Statistics Updates

Michael Bourgon has a script to get information on statistics updates for stats based off of indexes:

Quickie, based off an earlier post. (http://thebakingdba.blogspot.com/2012/02/tuning-statistics-when-were-they.html)

Get the last 4 stat updates for every statistic based on an index. The filter is on the auto_created; flip that to get all the system

This does use the DBCC SHOW_STATISTICS command, which reminds me of a rant (though not about Michael’s code; it’s about the need to use this DBCC command rather than having a nice DMV which returns all of the relevant information).

Comments closed

Duplicate Statistics

Shaun J. Stuart discusses removing duplicate statistics:

I puzzled on this for a bit and got sidetracked by the strange way SSMS displays statistics columns on the Property page. Then it got to be the end of the day and I went home. The next day, I had a comment on my previous post from Aaron Bertrand who mentioned there is a related bug with the stats_column_id column of the sys.stats_columns view. It does not contain what the MSDN documentation says it contains. The Connect item for this, along with a workaround, can be found here.

The script I was using did not reference that column, but it did get me thinking that perhaps the script was not correctly identifying the first column in an index.

Shaun has an updated version of a duplicate statistics checker script that you may want to check out.

Comments closed

Multi-Column, Auto-Created Statistics

Shaun J. Stuart looks into a scenario in which it appears that multi-column, auto-created statistics were generated:

Wow.. That sure looks like three auto-created, multi-column statistics! We have three stats: stats_ids 3, 4, and 5. The sys.stats_column table contains one row for each column that is in a statistic, so multiple rows for a single statistic (i.e., a single stats_id value), indicate multiple columns in that stat. Indeed, the column_id values indicate the table columns contained the stat. So stats_id 3 contains columns VersionMajor and ApplicationID (column_ids 3 and 1), stats_id 4 contains columns VersionMinor and ApplicationID (column_ids 4 and 1), and stats_id 5 contains columns VersionRevision and ApplicationID (column_ids 5 and 1). And, clearly, the auto_created flag is true, so these three stats were auto-created. What’s going on?

Read on for the answer.

Comments closed

Trace Flag 2389

Erin Stellato looks at using Trace Flag 2389 with the new cardinality estimator in SQL Server 2014:

To summarize, when using compatibility mode 110 or below, trace flag 2389 works like it always has.  But when using compatibility mode 120 or higher, and thus the new CE, the estimates are not the same compared to the old CE, and in this specific case, are not that different whether using the trace flag or not.

So what should you do?  Test, as always.  I haven’t found anything documented in MSDN that states that trace flag 2389 is not supported with compatibility mode 120 and higher, nor have I found anything that documents a change in behavior.  I do find it very interesting that the estimates are different (in this case much lower) with the new CE.  That could potentially be an issue, but there are multiple factors in play when it comes to estimates, and this was a very simple query (one table, one predicate).  In this case, the estimate is way off (4920 rows versus the 22,595 rows for the June 5 date).

I highly recommend reading this article.

Comments closed

Statistic Column Sort Order

Shaun J. Stuart points out an inconsistency in display order for columns on a statistic:

What’s going on? Why are the columns in the statistic not in the same order as the columns in the index? Well, it turns out, they are. If we look on the Details page, we see the density vector is, in fact, created as Col2, Col1, Col3, which is the order of the columns in the index:

Read the whole thing to avoid confusion next time you look at the statistics GUI.

Comments closed