Press "Enter" to skip to content

Category: Statistics

Multi-Column, Auto-Created Statistics

Shaun J. Stuart looks into a scenario in which it appears that multi-column, auto-created statistics were generated:

Wow.. That sure looks like three auto-created, multi-column statistics! We have three stats: stats_ids 3, 4, and 5. The sys.stats_column table contains one row for each column that is in a statistic, so multiple rows for a single statistic (i.e., a single stats_id value), indicate multiple columns in that stat. Indeed, the column_id values indicate the table columns contained the stat. So stats_id 3 contains columns VersionMajor and ApplicationID (column_ids 3 and 1), stats_id 4 contains columns VersionMinor and ApplicationID (column_ids 4 and 1), and stats_id 5 contains columns VersionRevision and ApplicationID (column_ids 5 and 1). And, clearly, the auto_created flag is true, so these three stats were auto-created. What’s going on?

Read on for the answer.

Comments closed

Trace Flag 2389

Erin Stellato looks at using Trace Flag 2389 with the new cardinality estimator in SQL Server 2014:

To summarize, when using compatibility mode 110 or below, trace flag 2389 works like it always has.  But when using compatibility mode 120 or higher, and thus the new CE, the estimates are not the same compared to the old CE, and in this specific case, are not that different whether using the trace flag or not.

So what should you do?  Test, as always.  I haven’t found anything documented in MSDN that states that trace flag 2389 is not supported with compatibility mode 120 and higher, nor have I found anything that documents a change in behavior.  I do find it very interesting that the estimates are different (in this case much lower) with the new CE.  That could potentially be an issue, but there are multiple factors in play when it comes to estimates, and this was a very simple query (one table, one predicate).  In this case, the estimate is way off (4920 rows versus the 22,595 rows for the June 5 date).

I highly recommend reading this article.

Comments closed

Statistic Column Sort Order

Shaun J. Stuart points out an inconsistency in display order for columns on a statistic:

What’s going on? Why are the columns in the statistic not in the same order as the columns in the index? Well, it turns out, they are. If we look on the Details page, we see the density vector is, in fact, created as Col2, Col1, Col3, which is the order of the columns in the index:

Read the whole thing to avoid confusion next time you look at the statistics GUI.

Comments closed

Limitations Of Statistics

Erik Darling discusses statistics and limitations of statistics outside of indexes:

Head rush moment: SQL may use information from histograms outside of the index it chooses for cardinality estimation.

Back to earth: If you just create a bunch of statistics instead of indexes, you’re (at best) using your Clustered Index for everything (which is still bad), or you’re using a HEAP for everything (which is usually worse). You’re still generally better off creating good indexes for your workload. They’ll get statistics objects created and associated with them, and if SQL thinks another column is interesting, it will create a single column statistics object for it, as long as you haven’t turned off auto create stats.

Sure, you can put on your black cloak and goat mask and create some multi-column or filtered statistics, but in the words of a wise man (Doug), you end up with more stats to maintain and understanding query behavior gets more difficult.

Erik argues that at the margin, keeping existing statistics up to date tends to be a better solution than creating new statistics.

Comments closed

Indexes And Stats

Brent Ozar looks at a case when adding a suggested index monkeys with stats:

The query runs faster, make no mistake – but check out the estimates:

  • Estimated number of rows = 1
  • Actual number of rows = 165,367

Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?

Based on this result, there might be further optimizations available.  Read on for more of Brent’s thoughts.

Comments closed

Understanding Estimation Variance

Aaron Bertrand discusses drift between estimates and actuals in execution plans:

Now, inaccurate estimates won’t always be a problem, but it can cause issues with inefficient plan choices at the two extremes. A single plan might not be optimal when the chosen range will yield a very small or very large percentage of the table or index, and this can get very hard for SQL Server to predict when the data distribution is uneven. Joseph Sack outlined the more typical things bad estimates can affect in his post, “Ten Common Threats to Execution Plan Quality:”

Read the whole thing.

Comments closed

Where Polybase Stats Live

I dig into where the statistics against a Polybase table actually live:

Today, we learned that Polybase statistics are stored in the same way as other statistics; as far as SQL Server is concerned, they’re just more statistics built from a table (remembering that the way stats get created involves loading data into a temp table and building stats off of that temp table).  We can do most of what you’d expect with these stats, but beware calling sys.dm_db_stats_properties() on Polybase stats, as they may not show up.

Also, remember that you cannot maintain, auto-create, auto-update, or otherwise modify these stats.  The only way to modify Polybase stats is to drop and re-create them, and if you’re dealing with a large enough table, you might want to take a sample.

The result isn’t very surprising in retrospect, and it’s good that “stats are stats are stats” is the correct answer.

Comments closed

Messing With Statistics

Erik Darling shows how to fake stats:

One thing I’ve found is that the inflated counts don’t seem to change anything for Identities, or Primary Keys. You’ll always get very reasonable plans and estimates regardless of how high you set row and page counts for those. Regular old clustered indexes are fair game.

Some really interesting things can start to happen to execution plans when SQL thinks there’s this many rows in a table. The first is that SQL will use a rare (in my experience) plan choice: Index Intersection. You can think of this like a Key Lookup but with two nonclustered indexes rather than from one nonclustered index to the clustered index.

This is very useful when you don’t have many rows in dev, can’t put many rows in dev, and can’t restore a stats-only database from prod.

Comments closed

Stats Terminology

Erik Darling fills in gaps on statistics terminology in his unique style:

SELECTIVITY

This tells you how special your snowflakes are. When a column is called “highly selective” that usually means values aren’t repeating all that often, if at all. Think about order numbers, identity or sequence values, GUIDs, etc.

DENSITY

This is sort of the anti-matter to selectivity. Highly dense columns aren’t very unique. They’ll return a lot of rows for a given value. Think about Zip Codes, Gender, Marital Status, etc. If you were to select all the people in 10002, a densely (there’s that word again) populated zip code in Chinatown, you’d probably wait a while, kill the query, and add another filter.

Combine that with Kendra Little’s statistics FAQ for additional learning.

Comments closed

Statistics FAQ

Kendra Little has a great FAQ on statistics, from the standpoint of developers as well as administrators:

I’ve been asked a lot of questions about updating statistics in SQL Server over the years. And I’ve asked a lot of questions myself! Here’s a rundown of all the practical questions that I tend to get about how to maintain these in SQL Server.

I don’t dig into the internals of statistics and optimization in this post. If you’re interested in that, head on over and read the fahhhbulous white paper, Statistics Used by the Query Optimizer in SQL Server 2008. Then wait a couple days and chase it with it’s charming cousin, Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.

I’m also not talking about statistics for memory optimized tables in this article. If you’re interested in those, head over here.

This is a great distillation of important and oft-misunderstood content.

Comments closed