Press "Enter" to skip to content

Category: Statistics

Auto-Update Stats Threshold Change

Erik Darling points out that automatic statistics updates will happen more frequently in 2016 for large tables than in prior versions:

Here’s an abridged version of 10-20 million and 30-40 million rows, and how many modifications they took before a stats update occurred. If you follow the PercentMod column down, the returns diminish a bit the higher up you get. I’m not saying that I’d prefer to wait for 20% + 500 rows to modify, by any stretch. My only point here is that there’s not a set percentage to point to.

And, because you’re probably wondering, turning on Trace Flag 2371 in 2016 doesn’t make any difference.

This is a good change, though as Erik points out, if you’re managing very large tables, you might already have the trace flag on and thereby won’t see any difference.

Comments closed

Specifying Statistic Row Counts

Daniel Hutmacher shows how to fake row counts using the UPDATE STATISTICS command:

In this case, the only change was that the query went parallel, which introduces a few more operators. However, a lot can change in a query when you scale the volume, and quite often, the entire layout of the plan can change dramatically.

Note that the query optimizer not only considers the rowcount, but often also the page count (which translates to how many megabytes of data need to be moved), so you may do well to include WITH PAGECOUNT as well.

This isn’t something I’ve ever done, but could be interesting in some scenarios, such as finding out how an application will run as the database grows.

Comments closed

Object Level Auto-Stats

Andy Mallon on auto-statistics:

I’m not going to say whether I think Auto Update Statistics should be on or off. Instead, I’m going to argue that there are definitely scenarios when you want to have this on AND there are scenarios where you want it turned off. Can you really have this both ways? Absolutely.

I’m going to say yes to auto-update unless you know the answer is no for an object.  But it’s nice to know that the fine-grained option is available.

Comments closed

Beware Statistics Corruption

Robert Davis shares a story of statistics corruption causing certain queries to fail:

I suspected that there was some difference between the queries that failed and the ones that were successful in SSMS. It ran the query they gave me, and I got the same error. I got disconnected and no further error info was returned. I also verified that the same query was successful on the otehr two tables mentioned. No errors on the other tables.

I wanted to know what error was causing the connection to be terminated, so I checked the SQL log and discovered that every time it failed, it was generating a stack dump. Before I was done investigating, it had generated 21 stack dumps. The key user-usable error info in the log was:

* Exception Address = 00007FF93BCF7E08 Module(sqlmin+00000000001E7E08)
* Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION
* Access Violation occurred reading address 0000000000000000

It turns out that CHECKDB & CHECKTABLE do not look at statistics.  If you find yourself in this situation, it’s not a bad idea to see if this is the cause.

Comments closed

Build Your Own Statistics

Dan Holmes shows how to build custom statistics:

There is a performance benefit to imported stats. The cost to compute the stats are on an “offline” table. The only downtime for the production table is the duration of the stream import.

This process does use undocumented features and it looks like it could be dangerous, but remember there is an easy undo: the update statistics statement. If something goes wrong, the statistics can always be updated using standard T-SQL.

Scheduling this code to run regularly can greatly help the optimizer produce better plans given a data set that changes over the tipping point but not enough to trigger a statistics update.

This feels like the time of thing you want to know because it’ll come in handy once, but if you feel the need to use it frequently, that may not be the best choice.

Comments closed

Checking Statistics Validity

Bob Dorr has scripts to tell if your statistics are accurate:

The dilemma we all run into is what level of SAMPLED statistics is appropriate?   The answer is you have to test but that is not always feasible and in the case of Microsoft CSS we generally don’t have histogram, historical states to revisit.

Microsoft CSS is engaged to help track down the source of a poorly performing query.   It is common step to locate possible cardinality mismatches and study them closer.   Studying the statistics dates, row modification counter(s), atypical parameters usage and the like are among the fundamental troubleshooting steps.

Good post and great scripts, even if he Microsoftly nouns the verb “ask.”

Comments closed