This is tale of troubleshooting…
When you unexpectedly or intermittently encounter a change in query performance, it can be extremely frustrating and troublesome for a DBA or developer. If you’re not using Query Store, a third-party application, or your own method to capture query data, then you probably don’t have information about what query performance looked like when things were good…you just know how it’s running now. I was working with a customer of Tim’s last week that had been chasing this exact problem for, in their words, years. They had recently upgraded to SQL Server 2016, and the problem was still occurring.
Strangely, “blame the network” didn’t appear in Erin’s post, so I don’t know if it’s comprehensive.
Let’s list the basic known details for the possible solution(for the Enterprise Edition of the Sql Server that is):
– We can make the secondary replica readable and read the same data on it. (Not that you should do that by default, but if you really know what you are doing …)
– We can copy our object into the TempDB (yeah, your Multi-TB table is probably not the best candidate for this operation), or maybe into some other writable DB.
– We can write results in the shared folder between the replicas (let’s say in a text file into a File Share)
– We can export the BLOB object of the statistics out of the SQL Server
– We can import the BLOB object of the statistics into the statistics
Read the whole thing.
The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window.
The error message claims that
UPDATE STATISTICScan only be used on a columnstore index with the
STATS_STREAMoption. However, the Microsoft Docs UPDATE STATISTICS shows very “thin” documentation for the option, showing only these two tidbits:
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
Columnstore indexes really don’t want their stats updated, apparently, and will fight you tooth and nail to prevent it.
There has never been such information before!
We are just writing into it!
Why do we have those wonderful 1351498 logical reads ?
Are they actually writes ? And if they would be, would not it be correct to display them as physical accesses ?
The answer is rather simple and actually should have been expected.
We are inserting a big amount of data into an empty table with a Primary Key, which triggers a creation/update of the statistics and those are the reads of the statistics scan operation.
I hadn’t noticed that, but it is quite interesting.
SQL Server does have a fairly useful dynamic management view, or DMV, which provides insight that can be leveraged in this area. The DMV I’m talking about is the set of DMVs around missing indexes, consisting of
sys.dm_db_missing_index_details, etc. I’m not saying the missing indexes DMVs are a panacea that will enable you to fix every performance situation you run into, but they can be useful if you know where to look. This post doesn’t go into a lot of depth about how to use those DMVs for the purpose of actually creating indexes, however I will show you how you can create multi-column stats objects as an interim performance booster while evaluating the need for those indexes.
I’ve never had great luck with multi-column stats versus simply creating indexes but that could simply be a case of me doing it wrong.
When building indexes for your queries, the order of your index key columns matters. SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.
But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?
One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.
In RDBMS, we sometimes use the term skew colloquially to mean the same thing as non-uniform distribution, i.e. a normal distribution would also be skewed. We simply mean that some values appear more often than others. Thus, I will put the term “skew” in double quotes in this article. While your RDBMS’s statistics contain this information once they are calculated, we can also detect such “skew” manually in ad-hoc queries using percentiles, which are defined in the SQL standard and supported in a variety of databases, as ordinary aggregate functions, including:
– SQL Server (regrettably, only as window functions)
As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here.
I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates.
This blog post is focusing on the Database Cloning improvement in the SQL Server 2019 that is already available in the public CTP 2.0 – the possibility of the automated statistics extraction for the Columnstore Indexes.
Well, there was quite a significant problem with the Columnstore Indexes previously – the statistics for them were not extracted into the cloned database, unless you did created the statistics in the most recent step before Database cloning.
Click through for more details and a comparison between SQL Server versions.
Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you.
I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for Parallelism of 5? MAXDOP 0? Missing index hints that include every column in the table? Oooookeydokey.
But there are some things that SQL Server has been taking care of for years, and automatically creating statistics is one of ’em.
There are edge cases where statistics auto-creation isn’t the best thing, but for the great majority of cases, it is a big positive.
I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues. I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA. He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you. Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.
These next steps assume the following:
You have created/configured your Azure Automation Account and credential to use to execute this runbook.
Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.