Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you.
I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for Parallelism of 5? MAXDOP 0? Missing index hints that include every column in the table? Oooookeydokey.
But there are some things that SQL Server has been taking care of for years, and automatically creating statistics is one of ’em.
There are edge cases where statistics auto-creation isn’t the best thing, but for the great majority of cases, it is a big positive.
I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues. I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA. He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you. Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.
These next steps assume the following:
You have created/configured your Azure Automation Account and credential to use to execute this runbook.
Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.
Interestingly, if SQL Server has auto-created a stats object on a column, and you subsequently modify that column, you receive no such error. SQL Server silently drops the statistics object, and modifies the column. The auto-created stats object is *not* automatically recreated until a query is executed that needs the stats object. This difference in how auto-created stats and manually created stats are treated by the engine can make for some confusion.
Just one more thing to think about if you manually create statistics on tables. But at least the error message is clear.
If you were to look at sys.indexes, you would see that these two indexes use index_id values of 1 and 3. The value 2 is skipped. It’s not because there used to be an index that was deleted after the index_id 3 index was created. It’s simply because of the relationship that index_id = stats_id, and there is already a statistic with stats_id = 2. When creating the index for the primary key, index_id 2 had to be skipped.
Check it out for additional insights.
Years ago someone said “Hey – why not drop auto-created stats, since the stats you need will just get created again and you’ll end up getting rid of those you no longer need.” That *may* be a reasonable step on some systems. If the risk of bad plans on first execution of a query needed stats that have been dropped is too high, its a bad deal. If the potential concurrent cost of auto-creating dropped stats is too high, that’s a bad deal. What about analyzing query plans over some period of time to see which stats are actually used in those plans? Then auto-stats which aren’t used in that set of plans could be dropped.
That type of stats analysis could have other uses, too. Prioritizing stats manual stats updates in regular maintenance comes to mind. Or, determining what stats to create/update on an Always On Availability Group primary based on secondary activity. And troubleshooting problem queries or identifying suspicious “watchlist” stats based on highly variable queries/plans they are involved with.
So I created this blog post almost 4 years ago. And now I’ll plead with you to not use the query there… it’s awful. If you want to query trace flag 8666 style stats from plan XML, please start from the query in this post instead – its much more well behaved 🙂
Read on for the script.
With this option, if no rows have changed, the statistic will not be updated. If one or more rows have changed, the statistic will be updated.
Since the release of SP1 for 2012, this has been my only challenge with Ola’s scripts. In SQL Server 2008R2 SP2 and SQL Server 2012 SP1 they introduced the sys.dm_db_stats_properties DMV, which tracks modifications for each statistic. I have written custom scripts to use this information to determine if stats should be updated, which I’ve talked about here. Jonathan has also modified Ola’s script for a few of our customers to look at sys.dm_db_stats_properties to determine if enough data had changed to update stats, and a long time ago we had emailed Ola to ask if he could include an option to set a threshold. Good news, that option now exists!
Keeping statistics up to date is a nice way of quietly improving performance in a system.
Statistics are made up of three parts. Each part tells the optimizer important information regarding the make up the table’s data distribution.
Header – Last Time Stats were updated and number of sample rows
Density Vector – Uniqueness of the columns or set of columns
Histogram– Data’s distribution and frequency of distinct values
Let’s look at a Header, Density and Histogram example.
You can read what the statistic are broken down into using DBCC SHOW_STATISTICS. All field definitions are taken from MSDN.
This is from AdventureWorks2016CTP3 sample database, if you want to follow along. Using the Sales. SalesOrderDetail table let’s look the stats and see what we can find out what it shows us.
Read the whole thing.
There are a variety of methods we use for helping customers upgrade to a new SQL Server version, and one question we often get asked is whether or not statistics need to be updated as part of the upgrade process.
Yes. Update statistics after an upgrade. Further, if you’re upgrading to 2012 or higher from an earlier version, you should rebuild your indexes (which will update index statistics, so then you just need to update column statistics).
Make this one of your “Not too long; did read” posts of the day.
A second option is to use statistics profiling. This was introduced in SQL Server 2014 and is easily set by using SET STATISTICS PROFILE ON orenable query profiling globally using DBCC TRACEON (7412, -1). This trace flag is only available in SQL Server 2016 SP1 and above. Selecting from the dynamic management view (DMV) Sys.dm_exec_query_profiles you can do real time query execution progress monitoring while the query is running. This option will return estimated and actual rows by operator.
Click through for the full set of methods.
A model variation is a new concept in the cardinality estimation framework 2014, that allows easily turn on and off some model assumptions and cardinality estimation algorithms. Model variations are based on a mechanism of pluggable heuristics and may be used in special cases. I think they are left for Microsoft support to be able to address some client’s CE issues pointwise.
Today we are going to view some interesting model variation, that creates filtered statistics on-the-fly. I should give a disclaimer here.
Warning: All the information below is presented for purely educational and curiosity purposes. This is completely undocumented and unsupported and should not ever be used in production systems unless Microsoft support will recommend you. More to the point, the usage of this model variation may affect the overall server performance in a negative way. This should be used for experiments and in the test environment only.
It’s interesting reading, though do heed that warning. This also isn’t a quick operation (seeing as how the database engine is creating filtered statistics), so it’s not a first-best choice. But worth keeping your back pocket.