Press "Enter" to skip to content

Category: Statistics

Determining Statistics Utilization

Deborah Melkin shows us how to see if a particular statistic is in use:

You know those tweets that you see once but can never find again? I remember seeing one a while ago where someone tweeted to #sqlhelp asking if the internal inserted and deleted tables had statistics or if they were like table variables, which didn’t.

This is a great question in general. But then it got me thinking – how do you prove this? I wanted to know the answer as well so I decided to look into this. And I went down the wrong sort of rabbit hole trying to figure this out. Eventually I talked to a friend about this and got pointed in the right direction…

And the answer to how you find which statistics are used is…?

Read on for the answer and several examples.

Comments closed

Decoding Statistics Names

Jason Brimhall explains how SQL Server comes up with names for auto-created statistics:

Every now and again I am asked about the meaning behind the automatically generated names for statistics in SQL Server. The quick answer is short, sweet and really easy. I give them a quick explanation and then often refer them to the blog post by Paul Randal on the topic.

The better answer is to show them what the auto-generated names really mean, alongside the great explanation from Paul. Finally, after years of the topic being on my backlog, I am sharing a script that will help decode those names and help to prove out fully what’s in a statistic name.

The proof is in the SQL; click through to see it.

Comments closed

The Costs of Bad Statistics

Monica Rathbun explains what happens when statistics go wrong:

Over Estimations of Rows (Actual > Estimated) leads to:

– Selection of parallel plan when a serial plan might be more optimal
– Inappropriate join strategy selections
– Inefficient Index Navigation (scan verses seek)
– Inflated Memory Grants

Read the whole thing. The optimizer doesn’t get to look at actual data when determining plans (save for something like adaptive query join processing, but that’s pretty rare), so statistics are its link to reality.

Comments closed

Estimates outside the Histogram Range

Josh Darnell shows us how SQL Server calculates estimates for input values outside of the range of your relevant statistic’s histogram:

I have the impression that CSelCalcColumnInInterval “fails” if the predicate doesn’t fall within any of the histogram intervals. The estimation logic then chooses to try the CSelCalcAscendingKeyFiltercalculator (a reference to the “ascending key problem”) if the predicate is specifically higher than the last histogram interval.

Josh includes a couple of demos as well, so check them out.

Comments closed

Waiting on Stats Refreshes

Erik Darling looks at a new wait type:

I mean ever really wondered just how long a query of yours waited on stats to automatically update before running?

Have you ever been absolutely, positively sure that those gosh dang darn stats updates were putting your query on the trolley to Slowsville?

Your wishes will be 100% granted in SQL Server 2019.

This is a wait type that I’d consider useful but hopefully uncommon.

Comments closed

Statistics and Multiple Single-Column Indexes

Erik Darling is fusing together queries like Dr. Frankenstein in his lab:

You may have noticed that both queries get pretty bad estimates. You might even be thinking about leaving me a comment to update stats.

The thing is that I created these indexes, which means they get stats built with a full scan, and it’s a demo database where nothing changes.

We just get unfortunate histograms, in this case. If I create very specific filtered statistics, both plans perform a key lookup.

I snipped this part because it’s so ingrained in DBAs that when performance is bad, updating statistics is a panacea.

Comments closed

Explaining Column Statistics

Bert Wagner takes us through column statistics in SQL Server:

Statistics are the primary meta data used by the query optimizer to help estimate the costs of retrieving data for a specific query plan.

The reason SQL Server uses statistics is to avoid having to calculate information about the data during query plan generation. For example, you don’t want to have the optimizer scan a billion row table to learn information about it, only to then scan it again when executing the actual query.

Instead, it’s preferable to have those summary statistics pre-calculated ahead of time. This allows the query optimizer to quickly generate and compare multiple candidate plans before choosing one to actually execute.

These statistics aren’t perfect, but life is almost always better when you have accurate, up-to-date statistics on relevant columns.

Comments closed

Troubleshooting Query Performance Changes

Erin Stellato walks us through a troubleshooting guide when users complain about poorly-performing queries:

This is tale of troubleshooting…

When you unexpectedly or intermittently encounter a change in query performance, it can be extremely frustrating and troublesome for a DBA or developer. If you’re not using Query Store, a third-party application, or your own method to capture query data, then you probably don’t have information about what query performance looked like when things were good…you just know how it’s running now. I was working with a customer of Tim’s last week that had been chasing this exact problem for, in their words, years. They had recently upgraded to SQL Server 2016, and the problem was still occurring.

Strangely, “blame the network” didn’t appear in Erin’s post, so I don’t know if it’s comprehensive.

Comments closed

Workarounds for Updating Stats on Secondaries

Niko Neugebauer wants statistics updates on tables running on readable Availability Group secondary nodes:

Let’s list the basic known details for the possible solution(for the Enterprise Edition of the Sql Server that is):
– We can make the secondary replica readable and read the same data on it. (Not that you should do that by default, but if you really know what you are doing …)
– We can copy our object into the TempDB (yeah, your Multi-TB table is probably not the best candidate for this operation), or maybe into some other writable DB.
– We can write results in the shared folder between the replicas (let’s say in a text file into a File Share)
– We can export the BLOB object of the statistics out of the SQL Server
– We can import the BLOB object of the statistics into the statistics

Read the whole thing.

Comments closed

Errors Updating Stats on Columnstore Indexes

Max Vernon walks us through some problems trying to update statistics on columnstore indexes:

The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window.

The error message claims that UPDATE STATISTICS can only be used on a columnstore index with the STATS_STREAM option. However, the Microsoft Docs UPDATE STATISTICS shows very “thin” documentation for the option, showing only these two tidbits:

<update_stats_stream_option>
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

Columnstore indexes really don’t want their stats updated, apparently, and will fight you tooth and nail to prevent it.

Comments closed