SET STATISTICS IO And Automated Statistics Updates

Niko Neugebauer shows us something new in SQL Server 2019:

There has never been such information before!
We are just writing into it!
Why do we have those wonderful 1351498 logical reads ?
Are they actually writes ? And if they would be, would not it be correct to display them as physical accesses ?

The answer is rather simple and actually should have been expected.
We are inserting a big amount of data into an empty table with a Primary Key, which triggers a creation/update of the statistics and those are the reads of the statistics scan operation. 

I hadn’t noticed that, but it is quite interesting.

Creating Multi-Column Statistics From Missing Index DMVs

Max Vernon shows how you can use the missing index DMVs to find potential candidates for multi-column statistics:

SQL Server does have a fairly useful dynamic management view, or DMV, which provides insight that can be leveraged in this area. The DMV I’m talking about is the set of DMVs around missing indexes, consisting of sys.dm_db_missing_index_groupssys.dm_db_missing_index_details, etc. I’m not saying the missing indexes DMVs are a panacea that will enable you to fix every performance situation you run into, but they can be useful if you know where to look. This post doesn’t go into a lot of depth about how to use those DMVs for the purpose of actually creating indexes, however I will show you how you can create multi-column stats objects as an interim performance booster while evaluating the need for those indexes.

I’ve never had great luck with multi-column stats versus simply creating indexes but that could simply be a case of me doing it wrong.

The Importance of Cardinality

Bert Wagner shows us why cardinality is important to understand when indexing data:

When building indexes for your queries, the order of your index key columns matters.  SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.

But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?

One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.

Calculating Skew In SQL

Lukas Eder shows how you can use PERCENTILE_DISC to calculate skewness in SQL:

In RDBMS, we sometimes use the term skew colloquially to mean the same thing as non-uniform distribution, i.e. a normal distribution would also be skewed. We simply mean that some values appear more often than others. Thus, I will put the term “skew” in double quotes in this article. While your RDBMS’s statistics contain this information once they are calculated, we can also detect such “skew” manually in ad-hoc queries using percentiles, which are defined in the SQL standard and supported in a variety of databases, as ordinary aggregate functions, including:
– Oracle
– PostgreSQL
– SQL Server (regrettably, only as window functions)

As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here.

Cloning And Columnstore Statistics

Niko Neugebauer points out a fix in SQL Server 2019:

I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates.

This blog post is focusing on the Database Cloning improvement in the SQL Server 2019 that is already available in the public CTP 2.0 – the possibility of the automated statistics extraction for the Columnstore Indexes.
Well, there was quite a significant problem with the Columnstore Indexes previously – the statistics for them were not extracted into the cloned database, unless you did created the statistics in the most recent step before Database cloning.

Click through for more details and a comparison between SQL Server versions.

The Value Of Auto-Created Statistics

Brent Ozar is here to praise statistics auto-creation:

Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you.

I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for Parallelism of 5? MAXDOP 0? Missing index hints that include every column in the table? Oooookeydokey.

But there are some things that SQL Server has been taking care of for years, and automatically creating statistics is one of ’em.

There are edge cases where statistics auto-creation isn’t the best thing, but for the great majority of cases, it is a big positive.

Configuring An Azure Runbook For Index Maintenance

Jim Donahoe explains how to perform index and statistics maintenance for Azure SQL Database, where you don’t have SQL Agent available:

I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues.  I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA.  He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you.  Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.


These next steps assume the following:

You have created/configured your Azure Automation Account and credential to use to execute this runbook.

Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.

Custom Statistics Block Column Alteration DDL

Max Vernon demonstrates that custom statistics and prevent you for modifying a column:

Interestingly, if SQL Server has auto-created a stats object on a column, and you subsequently modify that column, you receive no such error. SQL Server silently drops the statistics object, and modifies the column. The auto-created stats object is *not* automatically recreated until a query is executed that needs the stats object. This difference in how auto-created stats and manually created stats are treated by the engine can make for some confusion.

Just one more thing to think about if you manually create statistics on tables.  But at least the error message is clear.

What’s Special About stats_id = 1

Wayne Sheffield explains what makes stats_id = 1 special, as well as a relationship between stats_id and index_id in SQL Server:

If you were to look at sys.indexes, you would see that these two indexes use index_id values of 1 and 3. The value 2 is skipped. It’s not because there used to be an index that was deleted after the index_id 3 index was created. It’s simply because of the relationship that index_id = stats_id, and there is already a statistic with stats_id = 2. When creating the index for the primary key, index_id 2 had to be skipped.

Check it out for additional insights.

Retrieving Statistic Use From Query Plan XML

Lonny Niederstadt shows us how to retrieve stats usage details from SQL Server query plans if trace flag 8666 is enabled:

Years ago someone said “Hey – why not drop auto-created stats, since the stats you need will just get created again and you’ll end up getting rid of those you no longer need.”   That *may* be a reasonable step on some systems.  If the risk of bad plans on first execution of a query needed stats that have been dropped is too high, its a bad deal.  If the potential concurrent cost of auto-creating dropped stats is too high, that’s a bad deal.  What about analyzing query plans over some period of time to see which stats are actually used in those plans?  Then auto-stats which aren’t used in that set of plans could be dropped.

That type of stats analysis could have other uses, too.  Prioritizing stats manual stats updates in regular maintenance comes to mind.  Or, determining what stats to create/update on an Always On Availability Group primary based on secondary activity.  And troubleshooting problem queries or identifying suspicious “watchlist” stats based on highly variable queries/plans they are involved with.

So I created this blog post almost 4 years ago.  And now I’ll plead with you to not use the query there… it’s awful.  If you want to query trace flag 8666 style stats from plan XML, please start from the query in this post instead – its much more well behaved 🙂

Read on for the script.


March 2019
« Feb