Press "Enter" to skip to content

Category: Statistics

Multi-Column Statistics

Raul Gonzalez looks at how the different cardinality estimators handle multi-column statistics:

The thing we can learn from this is that is impossible to be always right when you have to estimate the number of rows if your only resource is statistics, doesn’t matter single or multi-column, there is a set of values out there ready to defeat your logic.

However I think it’s a good idea that SQL Server 2016 gets back to look into multi-column for a simple reason, these are user created stats and therefore gives us (DBA’s, DEV’s) more power over how rows are estimated.

Multi-column stats are probably among the most under-utilized tools in SQL Server.

Comments closed

Index-Based Statistics Updates

Michael Bourgon has a script to get information on statistics updates for stats based off of indexes:

Quickie, based off an earlier post. (http://thebakingdba.blogspot.com/2012/02/tuning-statistics-when-were-they.html)

Get the last 4 stat updates for every statistic based on an index. The filter is on the auto_created; flip that to get all the system

This does use the DBCC SHOW_STATISTICS command, which reminds me of a rant (though not about Michael’s code; it’s about the need to use this DBCC command rather than having a nice DMV which returns all of the relevant information).

Comments closed

Duplicate Statistics

Shaun J. Stuart discusses removing duplicate statistics:

I puzzled on this for a bit and got sidetracked by the strange way SSMS displays statistics columns on the Property page. Then it got to be the end of the day and I went home. The next day, I had a comment on my previous post from Aaron Bertrand who mentioned there is a related bug with the stats_column_id column of the sys.stats_columns view. It does not contain what the MSDN documentation says it contains. The Connect item for this, along with a workaround, can be found here.

The script I was using did not reference that column, but it did get me thinking that perhaps the script was not correctly identifying the first column in an index.

Shaun has an updated version of a duplicate statistics checker script that you may want to check out.

Comments closed

Multi-Column, Auto-Created Statistics

Shaun J. Stuart looks into a scenario in which it appears that multi-column, auto-created statistics were generated:

Wow.. That sure looks like three auto-created, multi-column statistics! We have three stats: stats_ids 3, 4, and 5. The sys.stats_column table contains one row for each column that is in a statistic, so multiple rows for a single statistic (i.e., a single stats_id value), indicate multiple columns in that stat. Indeed, the column_id values indicate the table columns contained the stat. So stats_id 3 contains columns VersionMajor and ApplicationID (column_ids 3 and 1), stats_id 4 contains columns VersionMinor and ApplicationID (column_ids 4 and 1), and stats_id 5 contains columns VersionRevision and ApplicationID (column_ids 5 and 1). And, clearly, the auto_created flag is true, so these three stats were auto-created. What’s going on?

Read on for the answer.

Comments closed

Trace Flag 2389

Erin Stellato looks at using Trace Flag 2389 with the new cardinality estimator in SQL Server 2014:

To summarize, when using compatibility mode 110 or below, trace flag 2389 works like it always has.  But when using compatibility mode 120 or higher, and thus the new CE, the estimates are not the same compared to the old CE, and in this specific case, are not that different whether using the trace flag or not.

So what should you do?  Test, as always.  I haven’t found anything documented in MSDN that states that trace flag 2389 is not supported with compatibility mode 120 and higher, nor have I found anything that documents a change in behavior.  I do find it very interesting that the estimates are different (in this case much lower) with the new CE.  That could potentially be an issue, but there are multiple factors in play when it comes to estimates, and this was a very simple query (one table, one predicate).  In this case, the estimate is way off (4920 rows versus the 22,595 rows for the June 5 date).

I highly recommend reading this article.

Comments closed

Statistic Column Sort Order

Shaun J. Stuart points out an inconsistency in display order for columns on a statistic:

What’s going on? Why are the columns in the statistic not in the same order as the columns in the index? Well, it turns out, they are. If we look on the Details page, we see the density vector is, in fact, created as Col2, Col1, Col3, which is the order of the columns in the index:

Read the whole thing to avoid confusion next time you look at the statistics GUI.

Comments closed

Limitations Of Statistics

Erik Darling discusses statistics and limitations of statistics outside of indexes:

Head rush moment: SQL may use information from histograms outside of the index it chooses for cardinality estimation.

Back to earth: If you just create a bunch of statistics instead of indexes, you’re (at best) using your Clustered Index for everything (which is still bad), or you’re using a HEAP for everything (which is usually worse). You’re still generally better off creating good indexes for your workload. They’ll get statistics objects created and associated with them, and if SQL thinks another column is interesting, it will create a single column statistics object for it, as long as you haven’t turned off auto create stats.

Sure, you can put on your black cloak and goat mask and create some multi-column or filtered statistics, but in the words of a wise man (Doug), you end up with more stats to maintain and understanding query behavior gets more difficult.

Erik argues that at the margin, keeping existing statistics up to date tends to be a better solution than creating new statistics.

Comments closed

Indexes And Stats

Brent Ozar looks at a case when adding a suggested index monkeys with stats:

The query runs faster, make no mistake – but check out the estimates:

  • Estimated number of rows = 1
  • Actual number of rows = 165,367

Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?

Based on this result, there might be further optimizations available.  Read on for more of Brent’s thoughts.

Comments closed

Understanding Estimation Variance

Aaron Bertrand discusses drift between estimates and actuals in execution plans:

Now, inaccurate estimates won’t always be a problem, but it can cause issues with inefficient plan choices at the two extremes. A single plan might not be optimal when the chosen range will yield a very small or very large percentage of the table or index, and this can get very hard for SQL Server to predict when the data distribution is uneven. Joseph Sack outlined the more typical things bad estimates can affect in his post, “Ten Common Threats to Execution Plan Quality:”

Read the whole thing.

Comments closed

Where Polybase Stats Live

I dig into where the statistics against a Polybase table actually live:

Today, we learned that Polybase statistics are stored in the same way as other statistics; as far as SQL Server is concerned, they’re just more statistics built from a table (remembering that the way stats get created involves loading data into a temp table and building stats off of that temp table).  We can do most of what you’d expect with these stats, but beware calling sys.dm_db_stats_properties() on Polybase stats, as they may not show up.

Also, remember that you cannot maintain, auto-create, auto-update, or otherwise modify these stats.  The only way to modify Polybase stats is to drop and re-create them, and if you’re dealing with a large enough table, you might want to take a sample.

The result isn’t very surprising in retrospect, and it’s good that “stats are stats are stats” is the correct answer.

Comments closed