Press "Enter" to skip to content

Category: Statistics

How to Update Statistics Manually

Matthew McGiffen takes us through the process of updating statistics:

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql

Even if you have an automated system, knowing how to update statistics is a great thing because you might need to run a one-off update to help a poorly-performing query. Or you’re using PolyBase, which doesn’t have the capability to perform statistics updates automatically because the data isn’t actually in SQL Server.

Comments closed

Statistics and Ascending Keys

Matthew McGiffen looks at a common problem with statistics:

The Ascending Key Problem relates to the most recently inserted data in your table which is therefore also the data that may not have been sampled and included in the statistics histograms. This sort of issue is one of the reasons it can be critical to update your statistics more regularly than the built-in automatic thresholds.

We’ll look at the problem itself, but also some of the mitigations that you can take to deal with it within SQL Server.

Click through for more detail.

Comments closed

Estimating Row Counts without Statistics

Matthew McGiffen dives into rules of thumb:

I find this is a question that comes up again and agan. What estimate for the number of rows returned does SQL Server use if you’re selecting from a column where there are no statistics available?

There are a few different algorithms used depending on how you’re querying the table. In this post we’ll look at where we have a predicate looking for a fixed value.

Read on for a few examples, noting that this specifically relates to tables and not things like table-valued parameters.

Comments closed

When Statistics Get Updated

Matthew McGiffen never promised us there would be no math:

Before using values from the statistics, the Optimizer will check to see if the statistics are “stale”, i.e. the modification counter exceeds a given threshold. If it does, SQL will trigger a resampling of the statistics before going on to form an execution plan. This means that the plan will be formed against up to date statistics for the table.

Read on for some experimentation on when stats updates kick in.

1 Comment

Row Estimates with Table Variables

Gail Shaw explains when table variables estimate one row and when they can generate estimates above one row:

At first glance, the question of how many rows are estimated from a table variable is easy.

But, is it really that simple? Well, not really. To dig into the why, first we need to identify why table variables estimate 1 row. The obvious answer is because they don’t have statistics. However…

Read on to learn the real answer.

Comments closed

The Default Cardinality Estimator and Ascending Keys

Erik Darling compares cardinality estimators:

Look, I’m not saying there’s only one thing that the “Default” cardinality estimator does better than the “Legacy” cardinality estimator. All I’m saying is that this is one thing that I think it does better.

What’s that one thing? Ascending keys. In particular, when queries search for values that haven’t quite made it to the histogram yet because a stats update hasn’t occurred since they landed in the mix.

Read the whole thing.

Comments closed

Statistics Aren’t Guarantees

Brent Ozar lays out an important point:

You might say, “But SQL Server has statistics on those columns, and it knows what the top values are!” Well, that’s true, but…data can change without the statistics being updated. For example, say that one user logs in right now, and then we run the MAX query again:

Statistics tell the engine what they learned at the time they were ran. If you need guarantees, that’s what constraints are for.

Comments closed

Caching and Statistics in Synapse Dedicated SQL Pools

Tsuyoshi Matsuzaki takes us through statistics and caching in Azure Synapse Analytics Dedicated SQL Pools:

In Synapse Analytics, several database objects (such as, compiled procedure, plan, …) will be cached in some conditions.
For instance, CCI tables (see my previous post “Azure Synapse Analytics : Choose Right Index and Partition” for CCI) will locally cache the recently-used columnstore segments on distributed compute nodes, which is called columnar cache. The local disk-based cache is used on Gen2 caching.
You cannot manually control these caching activities. (These are automatically applied to improve performance in Synapse Analytics.) See team blog “Adaptive caching powers Azure SQL Data Warehouse performance gains” for underlying architecture which improves caching in Gen2.

Dedicated SQL Pool behavior is close enough to on-premises SQL Server that it’s easy to expect everything to be the same, but there are some nuances.

Comments closed

Thinking about Temporary Stats on Snapshots

Lonny Niederstadt shares an interesting scenario:

Consider a snapshot database which is created daily.  The purpose is to provide analytics reporting access while maintenance or data loads take place in the source database.  In the snapshot database, analytics reports have no locking concerns from the activity in the underlying source database.  And the temporary statistics provided by SQL Server, combined with the statistics inherited from the source database, provide a lot of information to the optimizer for query plan selection.
But what if significant fact tables are queried in the snapshot and leave a situation like col2 in stats_test?  A column which generates an auto-created stat in the snapshot, but never gets a statistic created in the source database.  Each day, the cost of creating that statistic and every statistic like it will be paid as part of the workload.  Even if the underlying table is a now-stable dimension. 

Click through for the demonstration.

Comments closed