Press "Enter" to skip to content

Category: Statistics

Viewing Stats Used in Creating Execution Plans

Matthew McGiffen shows us how to find the statistics used when generating an execution plan:

Statistics are vital in allowing SQL Server to execute your queries in the most performant manner. Having a deep understanding of how the SQL Server Optimizer interacts with Statistics really helps when you are performance tuning

One thing that can be useful when looking at an execution plan is to understand what statistics objects the optimizer used to come up with the plan. In this post we look at how that can be achieved using the undocumented traceflag 8666 which can be used to save internal debugging informational into the plan XML – including details of the Statistics objects used. 

Click through for a couple of caveats about this, as well as a primer on how to see those precious statistics.

Comments closed

Number of Rows Automatically Sampled versus Table Size

Matthew McGiffen does the math:

I mentioned in my previous post about manually updating statistics that you can specify whether they’re updated using a full scan, or you can specify an amount of data to sample, either a percentage of the table size, or a fixed number of rows. You can also choose not to specify this, and SQL Server will decide for you whether to do a full scan, or to sample a certain amount of data.

I thought it would be interesting to look at what the sample sizes are that SQL will choose to use, depending on the amount of data in your table. 

Click through for the result of Matthew’s analysis.

Comments closed

How to Update Statistics Manually

Matthew McGiffen takes us through the process of updating statistics:

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:

Even if you have an automated system, knowing how to update statistics is a great thing because you might need to run a one-off update to help a poorly-performing query. Or you’re using PolyBase, which doesn’t have the capability to perform statistics updates automatically because the data isn’t actually in SQL Server.

Comments closed

Statistics and Ascending Keys

Matthew McGiffen looks at a common problem with statistics:

The Ascending Key Problem relates to the most recently inserted data in your table which is therefore also the data that may not have been sampled and included in the statistics histograms. This sort of issue is one of the reasons it can be critical to update your statistics more regularly than the built-in automatic thresholds.

We’ll look at the problem itself, but also some of the mitigations that you can take to deal with it within SQL Server.

Click through for more detail.

Comments closed

Estimating Row Counts without Statistics

Matthew McGiffen dives into rules of thumb:

I find this is a question that comes up again and agan. What estimate for the number of rows returned does SQL Server use if you’re selecting from a column where there are no statistics available?

There are a few different algorithms used depending on how you’re querying the table. In this post we’ll look at where we have a predicate looking for a fixed value.

Read on for a few examples, noting that this specifically relates to tables and not things like table-valued parameters.

Comments closed

When Statistics Get Updated

Matthew McGiffen never promised us there would be no math:

Before using values from the statistics, the Optimizer will check to see if the statistics are “stale”, i.e. the modification counter exceeds a given threshold. If it does, SQL will trigger a resampling of the statistics before going on to form an execution plan. This means that the plan will be formed against up to date statistics for the table.

Read on for some experimentation on when stats updates kick in.

1 Comment

Row Estimates with Table Variables

Gail Shaw explains when table variables estimate one row and when they can generate estimates above one row:

At first glance, the question of how many rows are estimated from a table variable is easy.

But, is it really that simple? Well, not really. To dig into the why, first we need to identify why table variables estimate 1 row. The obvious answer is because they don’t have statistics. However…

Read on to learn the real answer.

Comments closed

The Default Cardinality Estimator and Ascending Keys

Erik Darling compares cardinality estimators:

Look, I’m not saying there’s only one thing that the “Default” cardinality estimator does better than the “Legacy” cardinality estimator. All I’m saying is that this is one thing that I think it does better.

What’s that one thing? Ascending keys. In particular, when queries search for values that haven’t quite made it to the histogram yet because a stats update hasn’t occurred since they landed in the mix.

Read the whole thing.

Comments closed

Statistics Aren’t Guarantees

Brent Ozar lays out an important point:

You might say, “But SQL Server has statistics on those columns, and it knows what the top values are!” Well, that’s true, but…data can change without the statistics being updated. For example, say that one user logs in right now, and then we run the MAX query again:

Statistics tell the engine what they learned at the time they were ran. If you need guarantees, that’s what constraints are for.

Comments closed

Caching and Statistics in Synapse Dedicated SQL Pools

Tsuyoshi Matsuzaki takes us through statistics and caching in Azure Synapse Analytics Dedicated SQL Pools:

In Synapse Analytics, several database objects (such as, compiled procedure, plan, …) will be cached in some conditions.
For instance, CCI tables (see my previous post “Azure Synapse Analytics : Choose Right Index and Partition” for CCI) will locally cache the recently-used columnstore segments on distributed compute nodes, which is called columnar cache. The local disk-based cache is used on Gen2 caching.
You cannot manually control these caching activities. (These are automatically applied to improve performance in Synapse Analytics.) See team blog “Adaptive caching powers Azure SQL Data Warehouse performance gains” for underlying architecture which improves caching in Gen2.

Dedicated SQL Pool behavior is close enough to on-premises SQL Server that it’s easy to expect everything to be the same, but there are some nuances.

Comments closed