Press "Enter" to skip to content

Category: Statistics

Updating Stats With Ola’s Scripts

Erin Stellato shows us a smarter way to update statistics using Ola Hallengren’s index optimization script:

With this option, if no rows have changed, the statistic will not be updated.  If one or more rows have changed, the statistic will be updated.

Since the release of SP1 for 2012, this has been my only challenge with Ola’s scripts.  In SQL Server 2008R2 SP2 and SQL Server 2012 SP1 they introduced the sys.dm_db_stats_properties DMV, which tracks modifications for each statistic.  I have written custom scripts to use this information to determine if stats should be updated, which I’ve talked about here.  Jonathan has also modified Ola’s script for a few of our customers to look at sys.dm_db_stats_properties to determine if enough data had changed to update stats, and a long time ago we had emailed Ola to ask if he could include an option to set a threshold.  Good news, that option now exists!

Keeping statistics up to date is a nice way of quietly improving performance in a system.

Comments closed

The Value Of Statistics In SQL Server

Monica Rathbun walks us through the benefits of having statistics on tables in SQL Server:

Statistics are made up of three parts. Each part tells the optimizer important information regarding the make up the table’s data distribution.

Header – Last Time Stats were updated and number of sample rows

Density Vector – Uniqueness of the columns or set of columns

Histogram– Data’s distribution and frequency of distinct values

Let’s look at a Header, Density and Histogram example.

You can read what the statistic are broken down into using DBCC SHOW_STATISTICS. All field definitions are taken from MSDN.

This is from AdventureWorks2016CTP3 sample database, if you want to follow along. Using the Sales. SalesOrderDetail table let’s look the stats and see what we can find out what it shows us.

Read the whole thing.

Comments closed

Update Statistics After An Upgrade

Erin Stellato gives us some good life advice:

There are a variety of methods we use for helping customers upgrade to a new SQL Server version, and one question we often get asked is whether or not statistics need to be updated as part of the upgrade process.

tl;dr

Yes.  Update statistics after an upgrade.  Further, if you’re upgrading to 2012 or higher from an earlier version, you should rebuild your indexes (which will update index statistics, so then you just need to update column statistics).

Make this one of your “Not too long; did read” posts of the day.

Comments closed

Methods For Capturing Cardinality Estimate Statistics

Monica Rathbun gives us five methods for finding cardinality estimate values when running a SQL Server query:

A second option is to use statistics profiling. This was introduced in SQL Server 2014 and is easily set by using SET STATISTICS PROFILE ON orenable query profiling globally using DBCC TRACEON (7412, -1). This trace flag is only available in SQL Server 2016 SP1 and above.  Selecting from the dynamic management view (DMV) Sys.dm_exec_query_profiles you can do real time query execution progress monitoring while the query is running.  This option will return estimated and actual rows by operator.

Click through for the full set of methods.

Comments closed

Auto-Generated Filtered Stats

Dmitry Piliugin shows one way that we can optimize a query whose main problem is poor row estimates:

A model variation is a new concept in the cardinality estimation framework 2014, that allows easily turn on and off some model assumptions and cardinality estimation algorithms. Model variations are based on a mechanism of pluggable heuristics and may be used in special cases. I think they are left for Microsoft support to be able to address some client’s CE issues pointwise.

Today we are going to view some interesting model variation, that creates filtered statistics on-the-fly. I should give a disclaimer here.

Warning: All the information below is presented for purely educational and curiosity purposes. This is completely undocumented and unsupported and should not ever be used in production systems unless Microsoft support will recommend you. More to the point, the usage of this model variation may affect the overall server performance in a negative way. This should be used for experiments and in the test environment only.

It’s interesting reading, though do heed that warning.  This also isn’t a quick operation (seeing as how the database engine is creating filtered statistics), so it’s not a first-best choice.  But worth keeping your back pocket.

Comments closed

Collecting Statistics Usage Info

Grant Fritchey shows us how (safely) to collect data on statistics usage:

Years ago I was of the opinion that it wasn’t really possible to see the statistics used in the generation of a query plan. If you read the comments here, I was corrected of that notion. However, I’ve never been a fan of using undocumented trace flags. Yeah, super heroes like Fabiano Amorim and Paul White use them, but for regular individuals like me, it seems like a recipe for disaster. Further, if you read about these trace flags, they cause problems on your system. Even Fabiano was getting the occasional crash.

So, what’s a safe way to get that information? First up, Extended Events. If you use the auto_stats event, you can see the statistics getting created and getting loaded and used. Even if they’re not created, you can see them getting loaded. It’s an easy way to quickly see which statistics were used to generate a plan. One note, you’ll have to compile or recompile a given query to see this in action.

Read on for more.

Comments closed

Understanding DBCC SHOW_STATISTICS Outputs

Bill Wolf continues his series on statistics by looking at what DBCC SHOW_STATISTICS gives you:

When I was putting together the lesson plans for this, I wanted to make my own query for the comparisons, not borrow one from another site or blog.  Yes, I borrow plenty, but I wanted this to be mine.  When I was presenting my “code tuning” class, I had recently upgraded my instance from 2012 to 2017.  I had also put my database into 2017 compatibility mode.  I had used this query to show that unions that are intensive can cause issues with tempdb and cause spill over.  To my “joy”, when I ran the query in the class I did not get the tempdb spillover.  And right then I realized that I was not in Kansas(2012 compatibility) any longer.  But this proved to be opportunistic for the statistics/optimizer comparison.

Read on for a discussion of the cardinality estimator as well.

Comments closed

Row Counts From Statistics In Azure DW

Derik Hammer has a script to estimate row counts in an Azure SQL Data Warehouse table:

Azure SQL Data Warehouse is a massively parallel processing (MPP) architecture designed for large-scale data warehouses. An MPP system creates logical / physical slices of the data. In SQL Data Warehouse’s case, the data has 60 logical slices, at all performance tiers. This means that a single table can have up to 60 different object_ids. This is why, in SQL Data Warehouse, there is the concept of physical and logical object_ids along with physical names.

Below is a query for finding row counts of tables in SQL Data Warehouse which accounts for the differences in architecture between my earlier script, written for SQL Server, and SQL Data Warehouse.

Click through for the script.

Comments closed

Updating Statistics On System Tables

Dan Guzman shows that you can update statistics on system tables:

The solution in many cases to simply update statistics on the underlying system tables indexes used by the problem DMV queries. This can be done selectively by identifying the system table indexes referenced in execution plan seek and scan operators of the problem query execution plan and then executing UPDATE STATISTICS on each index. However, the task is somewhat tedious.

Alternatively, one can simply update stats on all the system tables. Below is a script that generates and executes DDL to update stats on all non-empty system tables, making quick work of this.

Click through for the script, as well as an interesting note if you try to use constructs like @sql = @sql + N’some string’ in your code.

Comments closed

Base Versus Simple Containment

Joe Obbish takes a crack at explaining the difference between base containment and simple containment for cardinality estimation:

We know that the first query will return 500k rows and the second query will return 0 rows. However, can SQL Server know that? Each statistics object only contains information about its own column. There’s no correlation between the UNIQUE_ID and MOD_FILTER columns, so there isn’t a way for SQL Server to know that the queries will return different estimates. The query optimizer can create an estimate based on the filters on the WHERE clause and on the histograms of the join columns, but there’s no foolproof way to do that calculation. The presence of the filters introduces uncertainty into the estimate, even with statistics that perfectly describe the data for each column. The containment assumption is all about the modeling assumption that SQL Server has to make to resolve that uncertainty.

It’s an interesting post aimed at trying to get you to think like a simplified cardinality estimator.  SQL Server doesn’t behave exactly like this, but it’s a good mental reference point.

Comments closed