Press "Enter" to skip to content

Category: Statistics

Workarounds for Updating Stats on Secondaries

Niko Neugebauer wants statistics updates on tables running on readable Availability Group secondary nodes:

Let’s list the basic known details for the possible solution(for the Enterprise Edition of the Sql Server that is):
– We can make the secondary replica readable and read the same data on it. (Not that you should do that by default, but if you really know what you are doing …)
– We can copy our object into the TempDB (yeah, your Multi-TB table is probably not the best candidate for this operation), or maybe into some other writable DB.
– We can write results in the shared folder between the replicas (let’s say in a text file into a File Share)
– We can export the BLOB object of the statistics out of the SQL Server
– We can import the BLOB object of the statistics into the statistics

Read the whole thing.

Comments closed

Errors Updating Stats on Columnstore Indexes

Max Vernon walks us through some problems trying to update statistics on columnstore indexes:

The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window.

The error message claims that UPDATE STATISTICS can only be used on a columnstore index with the STATS_STREAM option. However, the Microsoft Docs UPDATE STATISTICS shows very “thin” documentation for the option, showing only these two tidbits:

<update_stats_stream_option>
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

Columnstore indexes really don’t want their stats updated, apparently, and will fight you tooth and nail to prevent it.

Comments closed

SET STATISTICS IO And Automated Statistics Updates

Niko Neugebauer shows us something new in SQL Server 2019:

There has never been such information before!
We are just writing into it!
Why do we have those wonderful 1351498 logical reads ?
Are they actually writes ? And if they would be, would not it be correct to display them as physical accesses ?

The answer is rather simple and actually should have been expected.
We are inserting a big amount of data into an empty table with a Primary Key, which triggers a creation/update of the statistics and those are the reads of the statistics scan operation. 

I hadn’t noticed that, but it is quite interesting.

Comments closed

Creating Multi-Column Statistics From Missing Index DMVs

Max Vernon shows how you can use the missing index DMVs to find potential candidates for multi-column statistics:

SQL Server does have a fairly useful dynamic management view, or DMV, which provides insight that can be leveraged in this area. The DMV I’m talking about is the set of DMVs around missing indexes, consisting of sys.dm_db_missing_index_groupssys.dm_db_missing_index_details, etc. I’m not saying the missing indexes DMVs are a panacea that will enable you to fix every performance situation you run into, but they can be useful if you know where to look. This post doesn’t go into a lot of depth about how to use those DMVs for the purpose of actually creating indexes, however I will show you how you can create multi-column stats objects as an interim performance booster while evaluating the need for those indexes.

I’ve never had great luck with multi-column stats versus simply creating indexes but that could simply be a case of me doing it wrong.

1 Comment

The Importance of Cardinality

Bert Wagner shows us why cardinality is important to understand when indexing data:

When building indexes for your queries, the order of your index key columns matters.  SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.

But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?

One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.

Comments closed

Calculating Skew In SQL

Lukas Eder shows how you can use PERCENTILE_DISC to calculate skewness in SQL:

In RDBMS, we sometimes use the term skew colloquially to mean the same thing as non-uniform distribution, i.e. a normal distribution would also be skewed. We simply mean that some values appear more often than others. Thus, I will put the term “skew” in double quotes in this article. While your RDBMS’s statistics contain this information once they are calculated, we can also detect such “skew” manually in ad-hoc queries using percentiles, which are defined in the SQL standard and supported in a variety of databases, as ordinary aggregate functions, including:
– Oracle
– PostgreSQL
– SQL Server (regrettably, only as window functions)

As Lukas implies, SQL Server is a step behind in terms of calculating percentiles, and calculating several percentiles over a large data set will be slow. Very slow. Though batch mode processing in 2019 does help here.

Comments closed

Cloning And Columnstore Statistics

Niko Neugebauer points out a fix in SQL Server 2019:

I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates.

This blog post is focusing on the Database Cloning improvement in the SQL Server 2019 that is already available in the public CTP 2.0 – the possibility of the automated statistics extraction for the Columnstore Indexes.
WHY ?
Well, there was quite a significant problem with the Columnstore Indexes previously – the statistics for them were not extracted into the cloned database, unless you did created the statistics in the most recent step before Database cloning.

Click through for more details and a comparison between SQL Server versions.

Comments closed

The Value Of Auto-Created Statistics

Brent Ozar is here to praise statistics auto-creation:

Let me rephrase: before you even start playing around with statistics, make sure you haven’t taken away SQL Server’s ability to do this for you.

I like to make fun of a lot of SQL Server’s built-in “auto-tuning” capabilities that do a pretty terrible job. Cost Threshold for Parallelism of 5? MAXDOP 0? Missing index hints that include every column in the table? Oooookeydokey.

But there are some things that SQL Server has been taking care of for years, and automatically creating statistics is one of ’em.

There are edge cases where statistics auto-creation isn’t the best thing, but for the great majority of cases, it is a big positive.

Comments closed

Configuring An Azure Runbook For Index Maintenance

Jim Donahoe explains how to perform index and statistics maintenance for Azure SQL Database, where you don’t have SQL Agent available:

I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues.  I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA.  He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you.  Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.

**IMPORTANT**

These next steps assume the following:

You have created/configured your Azure Automation Account and credential to use to execute this runbook.

Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.

Comments closed

Custom Statistics Block Column Alteration DDL

Max Vernon demonstrates that custom statistics and prevent you for modifying a column:

Interestingly, if SQL Server has auto-created a stats object on a column, and you subsequently modify that column, you receive no such error. SQL Server silently drops the statistics object, and modifies the column. The auto-created stats object is *not* automatically recreated until a query is executed that needs the stats object. This difference in how auto-created stats and manually created stats are treated by the engine can make for some confusion.

Just one more thing to think about if you manually create statistics on tables.  But at least the error message is clear.

Comments closed