Statistics – Curated SQL

PostgreSQL Query Tuning for the Oracle DBA

Published 2025-06-25 by Kevin Feasel

Kellyn Gorman continues a series on PostgreSQL for Oracle DBAs:

As an Oracle DBA venturing into the world of PostgreSQL, one of the most important areas to get comfortable with is performance tuning. While Oracle’s Cost-Based Optimizer (CBO) is a well-known powerhouse that many DBAs have learned to both respect and wrestle with, PostgreSQL offers its own sophisticated query planner that behaves differently. Understanding these differences is key to becoming proficient with PostgreSQL as we step into tuning.

Read on to gain an overview of what options are available on the PostgreSQL side and how they differ from what’s available in Oracle.

Comments closed

Maintaining Statistics Information Post-Update in PostgreSQL 18

Published 2025-06-18 by Kevin Feasel

Laurenz Albe takes a peek at an upcoming feature:

Everybody wants good performance. When it comes to the execution of SQL statements, accurate optimizer statistics are key. With the upcoming v18 release, PostgreSQL will preserve the optimizer statistics during an upgrade with dump/restore or pg_upgrade (see commit 1fd1bd8710 and following). With the beta testing season for PostgreSQL v18 opened, it is time to get acquainted with the new feature.

It’s kind of wild to me that this wasn’t in place years ago for PostgreSQL.

Comments closed

Automated Table Statistics on Delta Tables in Microsoft Fabric

Published 2025-05-30 by Kevin Feasel

Santhosh Kumar Ravindran makes an announcement:

We’re thrilled to introduce Automated Table Statistics in Microsoft Fabric Data Engineering — a major upgrade that helps you get blazing-fast query performance with zero manual effort.

Whether you’re running complex joins, large aggregations, or heavy filtering workloads, Fabric’s new automated statistics will help Spark make smarter decisions, saving you time, compute, and money.

Click through to see what’s included, as well as the limitations associated with this. You can still create manual statistics if you’d like, so on the whole, I approve.

Comments closed

Comparing Varieties of Statistics in SQL Server

Published 2025-04-15 by Kevin Feasel

Kendra Little gets the smorgasbord:

Statistics in SQL Server are simple in theory: they help the optimizer estimate how many rows a query might return.

In practice? Things get weird fast. Especially when you start filtering on multiple columns, or wondering why the optimizer thinks millions of rows are coming back when you know it’s more like a few hundred thousand.

In this post, I’ll walk through examples using single-column, multi-column, and filtered statistics—and show where estimates go off the rails, when they get back on track, and why that doesn’t always mean you need to update everything with FULLSCAN.

Read on for a review of the three types of statistics. Admittedly, I’ve never had much luck with filtered statistics improving the performance of queries. If I were to speculate, I’d say that they’re good for a very specific type of problem that maybe I just don’t run into that often.

Comments closed

Automating Management of Extended Statistics in PostgreSQL

Published 2025-03-10 by Kevin Feasel

Andrei Lepikhov builds an extension:

The extended statistics tool allows you to tell Postgres that additional statistics should be collected for a particular set of table columns. Why is this necessary? – I will try to quickly explain using the example of an open power plant database. For example, the fuel type (primary_fuel) used by a power plant is implicitly associated with the country’s name.

Click through to learn more about what extended statistics are and the nature of the extension.

Comments closed

Multi-Column Statistics in PostgreSQL

Published 2025-01-21 by Kevin Feasel

Hans-Jürgen Schönig creates new statistics:

If you are using PostgreSQL for analytics or large-scale aggregations, you might occasionally notice the planner making false assumptions regarding the number of rows. While this isn’t a problem for small aggregates, it is indeed an issue for large-scale aggregations featuring many different dimensions.

In short: The more columns your GROUP BY statement contains, the more likely it is that optimizer overestimates the row count.

This blog explains how this can be handled in PostgreSQL.

Maybe it’s just me, but I don’t recall many instances in which adding multi-column statistics without any sort of index change significantly improved a query’s performance. I can understand how it could improve things like memory grants, so perhaps that’s how I’m selling it short. But I struggle to recall a specific case in which a query got measurably faster as a result.

Comments closed

Estimates in SQL Server

Published 2024-10-07 by Kevin Feasel

Jared Poche digs into the topic of estimates:

Estimates and statistics are often discussed in our community, but I doubt the average DBA knows how they flow. So I wanted to write a post with examples showing how SQL Server estimates the rows for a specific operation.

Read on to learn more.

Comments closed

Query Start Times in Query Store

Published 2024-09-11 by Kevin Feasel

Hugo Kornelis describes an issue:

I was hired by a customer who had a very annoying issue with the daily data load of their data warehouse. The volume of data to be loaded is high and they were already struggling to finish the load before business opens. But that was not their biggest issue. The biggest problem, the real pain point that they hired me for, is that at unpredictable moments, the load would run much longer than normal, pushing it well into business hours. They wanted me to find out what caused those irregular delays, and find a way to stop them from happening.

Read on to learn more about the issue itself, as well as a discrepancy in what Query Store showed. Hugo also points out that the quick-and-easy solution may not be the right solution.

Comments closed

Planning Statistics Updates

Published 2024-08-30 by Kevin Feasel

Andy Mallon makes the case:

Let’s talk about the trade-offs.

Auto-updated stats are triggered by queries needing to access a table with “stale” stats. (It doesn’t matter how we define stale, so lets not think about it.) With the exception of very small tables, auto-updated stats use a random small data sample of the table data; in contrast scheduled jobs that usually have a larger sample size defined—I always say “FULLSCAN if you can!”

Read on for Andy’s thesis.

Comments closed

SQL Server Compilation Time and Storage

Published 2024-08-12 by Kevin Feasel

Kendra Little explains how storage can affect query compilation time:

Up till now, I’ve thought of compilation time in SQL Server as being dependent only on CPU resources– not something that requires fast storage to be speedy. But that’s not quite right.

Slow storage can result in periodic long compile time in SQL Server. And long compile time not only extends the runtime for the query, it can also result in blocking with waits for compile locks.

Click through for more details, as well as a video by Erik Darling on compile-time locks.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Statistics