Press "Enter" to skip to content

Category: Statistics

The Cost of Maintaining Extended Statistics in Postgres

Andrew Lepikhov breaks out the stopwatch:

In the previous post, I passionately advocated for integrating extended statistics and, moreover, creating them automatically. But what if it is too computationally demanding to keep statistics fresh?

This time, I will roll up my sleeves, get into the nitty-gritty and shed light on the burden extended statistics put on the digital shoulders of the database instance. Let’s set aside the cost of using this type of statistics during planning and focus on one aspect – how much time we will spend in an ANALYZE command execution.

Read the whole thing if you’re a Postgres admin or developer.

Leave a Comment

Merge Join vs Hash Join in Postgres

Andrei Lepikhov compares two physical join operators:

Today’s post is sparked by a puzzling observation: users, especially those who use an abstraction layer like REST or ORM library to interact with databases, frequently disable the MergeJoin option across the entire database instance. They justify this action by citing numerous instances of performance degradation.

Considering how many interesting execution paths MergeJoin adds to the system elaborating IncrementalSort or sort orderings derived from underlying IndexScan, it looks strange: one more bug of skewed cost balance inside the PostgreSQL cost model?

This is an interesting peek into how complex the query optimizers in database engines are, as well as how small amounts of information (via statistics or indexes) can matter to a query.

Leave a Comment

Blocking from Async Stats Updates

Tom Zika diagnoses an issue:

I recently encountered an issue where an index rebuild set to wait_at_low_priority ended up blocking an asynchronous statistics update. This interaction led to a large blocking chain where queries were waiting on the async stats update and started to timeout.

Read on for an explanation of all of the players involved, then a demo, and finally two solutions.

Comments closed

Finding Row Counts in SQL Server

Kevin Wilkie breaks out the abacus:

Today, I was working with SQL Server to get row counts from several tables so I thought I’d be smart and work with some functions in SQL Server to make it smarter / easier.

Now, if I am truly only getting “straight” row counts from these tables, I would be able to create a query like the below that would provide the answers with no problem:

Read on for the normal approach, as well as a more complicated approach made necessary due to some business logic requirements.

Comments closed

Finding Duplicate Statistics in SQL Server

Jose Manuel Jurado Diaz searches for clones:

Some time ago, we encountered a support case where a customer experienced significant delays in updating auto-created and user-created statistics. I would like to share the insights gained from this experience, including the underlying causes of the issue and the potential solutions we identified and implemented to address the problem effectively.

Read on for a demo to set up the scenario and the cause of the problem, as well as how to fix it.

Comments closed

The Risk of Query Failure on Readable Secondaries in SQL Server

Kendra Little explains a problem:

If you use readable secondaries in Availability Groups or Read-Scale out instances in Azure SQL Managed Instance, you may have queries fail repeatedly if there is a glitch and statistics are not successfully “refreshed” on the secondary replica. Those queries may keep failing until you manually intervene.

This has been the case for a few years, and it’s unclear if Microsoft will fix this. There is a well established support deflection article which documents the issue and provides ‘workarounds’.

Read on for Kendra’s thoughts. I haven’t run into this before, myself, but I also don’t tend to make very heavy use of readable secondaries.

Comments closed

Metadata-Based Counting and Filtered Indexes

Aaron Bertrand counts more efficiently:

That’s great when you want to count the whole table without size-of-entire-table reads. It gets more complicated if you need to retrieve the count of rows that meet – or don’t meet – some criteria. Sometimes an index can help, but not always, depending on how complex the criteria might be.

For me, counting more efficiently typically means I take off my shoes.

One other note is, if you just need a guesstimate, or if the cardinality of that column you’re splitting by is fairly low, you could also look at the histogram, especially if there’s a statistic on the column (or columns) you’re interested in. It’s rare that I think to go that way, but it is one of the tools the optimizer itself uses, so it’s fair game.

Comments closed

Manual Stats Updates in SQL Server

Matthew McGiffen can’t wait for SQL Server to update those stats:

Having up to date statistics is vital for getting the best performance out of your queries. Even though SQL Server automatically updates statistics in the background for you (When do statistics get updated?), you may find there are times when you want to manage updating them yourself.

Click through to learn several techniques for stats updates.

Comments closed

Statistics and Ascending Keys in SQL Server

Matthew McGiffen talks stats:

The Ascending Key Problem relates to the most recently inserted data in your table which is therefore also the data that may not have been sampled and included in the statistics histograms. This sort of issue is one of the reasons it can be critical to update your statistics more regularly than the built-in automatic thresholds.

We’ll look at the problem itself, but also some of the mitigations that you can take to deal with it within SQL Server.

Read on to see how this behaved prior to SQL Server 2014’s new cardinality estimator and what has changed since then.

Comments closed

When Statistics Updates Happen

Matthew McGiffen gives us the numbers:

SQL Server has had the ability to automatically update statistics since version 7.0. Nonetheless for a long part of my career working with SQL Server, whenever a performance issue raised its head everyone’s knee-jerk response would be “Update Statistics!” In most cases though the people shouting that didn’t really understand what the “Statistics” were, or what mechanisms might already be in place for keeping them up to date.

Of course SQL Server isn’t perfect and sometimes it is helpful for human intelligence to intervene. But to provide intelligent intervention one has to understand how things work.

Read on to learn what triggers automatic stats updates in various versions of SQL Server.

Comments closed