Statistics – Page 5 – Curated SQL

In Synapse Analytics, several database objects (such as, compiled procedure, plan, …) will be cached in some conditions.
For instance, CCI tables (see my previous post “Azure Synapse Analytics : Choose Right Index and Partition” for CCI) will locally cache the recently-used columnstore segments on distributed compute nodes, which is called columnar cache. The local disk-based cache is used on Gen2 caching.
You cannot manually control these caching activities. (These are automatically applied to improve performance in Synapse Analytics.) See team blog “Adaptive caching powers Azure SQL Data Warehouse performance gains” for underlying architecture which improves caching in Gen2.

Dedicated SQL Pool behavior is close enough to on-premises SQL Server that it’s easy to expect everything to be the same, but there are some nuances.

Comments closed

Thinking about Temporary Stats on Snapshots

Published 2020-09-14 by Kevin Feasel

Lonny Niederstadt shares an interesting scenario:

Consider a snapshot database which is created daily. The purpose is to provide analytics reporting access while maintenance or data loads take place in the source database. In the snapshot database, analytics reports have no locking concerns from the activity in the underlying source database. And the temporary statistics provided by SQL Server, combined with the statistics inherited from the source database, provide a lot of information to the optimizer for query plan selection.
But what if significant fact tables are queried in the snapshot and leave a situation like col2 in stats_test? A column which generates an auto-created stat in the snapshot, but never gets a statistic created in the source database. Each day, the cost of creating that statistic and every statistic like it will be paid as part of the workload. Even if the underlying table is a now-stable dimension.

Click through for the demonstration.

Comments closed

Persist Sample Percent Doesn’t

Published 2020-08-26 by Kevin Feasel

John Sterrett takes us through a bug in SQL Server:

When a bug jumps out and surprises me I like to share it so others do not run into the same unexpected result. I don’t think there is any reason for a bug to bite multiple people in the butt. Therefore, I want to show you why persist sample percent IS NOT PERSISTED!

Read on for the repro and upvote the Feedback item.

Comments closed

Statistics Management with Azure SQL DB Serverless

Published 2020-08-19 by Kevin Feasel

Joey D’Antony takes us through stats management with the serverless tier of Azure SQL Database:

One of the only things platform as a service databases like Azure SQL Database do not do for you is actively manage column and index statistics. While backups, patches, and even integrity checks are built into the platform services, managing your metadata is not. Since Azure SQL Database lacks a SQL Sever Agent for scheduling, you have to use an alternative for job scheduling.

Read on to learn about techniques as well as a few gotchas.

Comments closed

Improving Async Stats Update Concurrency

Published 2020-06-17 by Kevin Feasel

Dimitri Furman announces a change in Azure SQL Database:

In Azure SQL Database and Azure SQL Managed Instance, the background process that updates statistics asynchronously can now wait for the schema modification lock on a low priority queue. This improves concurrency for workloads with frequent query plan (re)compilations.
New behavior is enabled with the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration. This feature is currently in public preview.

Dimitri does a good job of explaining what this means and how it can make life a little better for people querying tables with statistics updates.

Comments closed

Determining Statistics Utilization

Published 2020-05-11 by Kevin Feasel

Deborah Melkin shows us how to see if a particular statistic is in use:

You know those tweets that you see once but can never find again? I remember seeing one a while ago where someone tweeted to #sqlhelp asking if the internal inserted and deleted tables had statistics or if they were like table variables, which didn’t.
This is a great question in general. But then it got me thinking – how do you prove this? I wanted to know the answer as well so I decided to look into this. And I went down the wrong sort of rabbit hole trying to figure this out. Eventually I talked to a friend about this and got pointed in the right direction…
And the answer to how you find which statistics are used is…?

Read on for the answer and several examples.

Comments closed

Decoding Statistics Names

Published 2020-04-20 by Kevin Feasel

Jason Brimhall explains how SQL Server comes up with names for auto-created statistics:

Every now and again I am asked about the meaning behind the automatically generated names for statistics in SQL Server. The quick answer is short, sweet and really easy. I give them a quick explanation and then often refer them to the blog post by Paul Randal on the topic.
The better answer is to show them what the auto-generated names really mean, alongside the great explanation from Paul. Finally, after years of the topic being on my backlog, I am sharing a script that will help decode those names and help to prove out fully what’s in a statistic name.

The proof is in the SQL; click through to see it.

Comments closed

The Costs of Bad Statistics

Published 2019-09-20 by Kevin Feasel

Monica Rathbun explains what happens when statistics go wrong:

Over Estimations of Rows (Actual > Estimated) leads to:
– Selection of parallel plan when a serial plan might be more optimal
– Inappropriate join strategy selections
– Inefficient Index Navigation (scan verses seek)
– Inflated Memory Grants

Read the whole thing. The optimizer doesn’t get to look at actual data when determining plans (save for something like adaptive query join processing, but that’s pretty rare), so statistics are its link to reality.

Comments closed

Estimates outside the Histogram Range

Published 2019-08-30 by Kevin Feasel

Josh Darnell shows us how SQL Server calculates estimates for input values outside of the range of your relevant statistic’s histogram:

I have the impression that CSelCalcColumnInInterval “fails” if the predicate doesn’t fall within any of the histogram intervals. The estimation logic then chooses to try the CSelCalcAscendingKeyFiltercalculator (a reference to the “ascending key problem”) if the predicate is specifically higher than the last histogram interval.

Josh includes a couple of demos as well, so check them out.

Comments closed

Waiting on Stats Refreshes

Published 2019-08-21 by Kevin Feasel

Erik Darling looks at a new wait type:

I mean ever really wondered just how long a query of yours waited on stats to automatically update before running?
Have you ever been absolutely, positively sure that those gosh dang darn stats updates were putting your query on the trolley to Slowsville?
Your wishes will be 100% granted in SQL Server 2019.

This is a wait type that I’d consider useful but hopefully uncommon.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Statistics

Caching and Statistics in Synapse Dedicated SQL Pools

Thinking about Temporary Stats on Snapshots

Persist Sample Percent Doesn’t

Statistics Management with Azure SQL DB Serverless

Improving Async Stats Update Concurrency

Determining Statistics Utilization

Decoding Statistics Names

The Costs of Bad Statistics

Estimates outside the Histogram Range

Waiting on Stats Refreshes