The Costs Of Statistics Updates With FULLSCAN

Kendra Little explains what happens when you update a table’s statistics with FULLSCAN:

On my test instance, the command that uses the default sampling takes 6 seconds to complete.

The command which adds “WITH FULLSCAN” takes just over five minutes to complete.

The reason is that those two little words can add a whole lot of extra IO to the work of updating statistics.

Kendra shows the query plans for each statistics update in some detail.  It’s a very interesting post, well worth taking the time to read.

NULL Values In The Histogram

Taiob Ali explains how NULL values show up in the SQL Server histogram when you create statistics:

In the density_vector section ‘All density’ value for column ‘PickingCompletedWhen’ is 0.0004705882 which was calculated from: 1/(Number of distinct values of column ‘PickingCompletedWhen’).
In this case which is 1/2125. All NULL values were considered as one. If you do a count of distinct values you will get a result of 2124. Reason is explained here. If you do a select of all distinct values NULL will show along with other 2124 values.

Taiob explains that there’s really nothing special about NULL when it comes to statistics.

Seeing Statistics In Execution Plans

Pedro Lopes announces that the statistics used to compile a plan are now available as part of the execution plan details:

OptimizerStatsUsage is available in cached plans, so getting the “estimated execution plan” and the “actual execution plan” will have this information.

In the above example, I see the ModificationCount is very high (almost as much as the table cardinality itself) which after closer observation, the statistic had been updated with NORECOMPUTE.

And looking and the Seek itself, there is a large skew between estimated and actual rows. In this case, I now know a good course of action is to update statistics. Doing so produces this new result: ModificationCounter is back to zero and estimations are now correct.

This will be a good addition to SQL Server 2017.

The Story Of Nick

Kenneth Fisher tells the story of where the optimizer’s cost value comes from:

Obviously, it’s an important subject, right? And yet we keep seeing comments about how the cost is in seconds.

And to be fair, it is. It’s an estimate of how many seconds a query would take, if it was running on a developers workstation from back in the 90’s. At least that’s the story. In fact Dave Dustin (t) posted this interesting story today:

The best way to think of cost is as a probabilistic, ordinal, unitless value:  3 might be greater than 2; 1000 is almost certainly greater than 2; and “2 what?” is undefined.

Cardinality Estimation On COUNT(*)

Paul White digs into how the cardinality estimator works with COUNT aggregations containing HAVING clauses:

The approach SQL Server takes is to assume that each group is most likely to contain the overall mean (average) number of rows. This is simply the cardinality divided by the number of unique values. For example, for 1000 rows with 20 unique values, SQL Server would assume that (1000 / 20) = 50 rows per group is the most likely value.

Turning back to our original example, this means that the computed count column is “most likely” to contain a value around (19614 / 575) ~= 34.1113. Since density is the reciprocal of the number of unique values, we can also express that as cardinality * density = (19614 * 0.00173913), giving a very similar result.

Definitely worth a careful read.

Columnstore Indexes On Cloned Databases

Parikshit Savjani has a script to update columnstore index statistics before running DBCC CLONEDATABASE:

Unlike traditional Btree indexes, when a columnstore index is created, there is no index statistics created on the columns of the columnstore indexes. However, there is an empty stats object created with the same name as columnstore index and an entry is added to sys.stats at the time of index creation. The stats object is populated on the fly when a query is executed against the columnstore index or when executing DBCC SHOW_STATISTICS against the columnstore index, but the columnstore index statistics aren’t persisted in the storage. The index statistics is different from the auto created statistics on the individual columns of columnstore indexes which is generated on the fly and persisted in the statistics object. Since the index statistics is
not persisted in storage, the clonedatabase will not contain those statistics leading to inaccurate stats and different query plans when same query has run against database clone as opposed to production database.

Click through for the script.

AG Secondary Stats Overwritten With Sample

Taiob Ali has run into an interesting issue:

Once I update my statistics with fullscan, with in 10~20 seconds some of the statistics on the same table are getting update on secondary with a sample pecent of rows. Meaning my best statistics are being overwritten with good (full vs sample) statistics. On primary node once I run “Update statistics Tablename with fullscan” . I see following about statistics status.

After 10~20 seconds of updating statistics in primary node if I check the status of the same on my secondary nodes, I see fullscan statistics is replaced by sample statistics. Look at the rows_sampled and last_updated column, you will see the sample row number and last_updated column time is within few seconds of update in primary. RowsModified column still showing zero records.

It’s happening on an Availability Group secondary.  Taiob has a workaround, so read on for that.

Which Data Types Can Create Statistics?

Raul Gonzalez figures out which data types cannot be part of statistics:

Yeah, there you go, all these _WA_Sys_ stats tell me they have been automatically created (there is a flag in sys.stats if you don’t believe me) but I can see there are only 31, where I created 34 columns.

That’s funny, let’s see which data types did get statistics.

The results are pretty interesting.

Automating Stats Maintenance With Azure SQL DW

Grant Fritchey shows how to create automated statistics maintenance for an Azure SQL Data Warehouse database:

NOTE: The most important habit you can start with in Azure is putting everything into discrete, planned, Resource Groups. These make management so much easier.

Once the account is set, the first thing you need is to create a Runbook. There is a collection of them for your use within Azure. None of them are immediately applicable for what I need. I’m just writing a really simple Powershell script to do what I want:

Runbooks are an important part of Azure maintenance, and this is a gentle introduction to them.

Updating Multiple Statistics Concurrently

SQL Scotsman explains trace flag 7471, which allows you to update multiple statistics on a table concurrently:

Running multiple UPDATE STATISTICS commands for different statistics on a single table concurrently has been available under global Trace Flag 7471 since SQL Server 2014 SP1 CU6 and SQL Server 2016 CU1.  Microsoft have documented this trace flag here and here.

It sounds like, for the most part, you might not want this flag turned on, but read the whole post.


June 2017
« May