Press "Enter" to skip to content

Category: Indexing

Read Efficiency in PostgreSQL Queries

Michael Christofides explains what’s happening under the covers:

A lot of the time in database land, our queries are I/O constrained. As such, performance work often involves reducing the number of page reads. Indexes are a prime example, but they don’t solve every issue (a couple of which we’ll now explore).

The way Postgres handles consistency while serving concurrent queries is by maintaining multiple row versions in both the main part of a table (the “heap”) as well as in the indexes (docs). Old row versions take up space, at least until they are no longer needed, and the space can be reused. This extra space is commonly referred to as “bloat”. Below we’ll look into both heap bloat and index bloat, how they can affect query performance, and what you can do to both prevent and respond to issues.

Read on for a detailed explanation.

Leave a Comment

Occasional Query Failures on a Small Table

Paul Randal troubleshoots an issue:

The table in question only had a few million rows of data in it, with a maximum row size of 60 bytes, and the query usually ran in a few seconds, but occasionally the query would ‘hang’ and would either be killed or take tens of minutes to run. Troubleshooting instrumentation when the issue happened showed no out-of-the-ordinary waits occurring, no pressure on the server, and the query plan generated when the query took a long time was essentially the same.

The only thing noticeable was that when the problem occurred, a column statistics update happened as part of query compilation, but with such a tiny table, how could that be the root cause of the issue? The calculated disk space for the row size and count worked out to be about 250MB, but with a statistics sample rate of only 4%, extended events showed an auto_stats event taking close to an hour!

Read on to learn the cause. I will admit that I did not get this one correct when I guessed what the cause could be.

Leave a Comment

Tracking Unused Indexes in PostgreSQL

Semab Tariq wants to see which indexes are in use:

Indexes exist to speed up data access. They allow PostgreSQL to avoid full table scans, significantly reducing query execution time for read-heavy workloads.

From real production experience, we have observed that well-designed, targeted indexes can improve query performance by 5× or more, especially on large transactional tables.

However, indexes are not free.

The reasons for why are very similar to what we have in SQL Server. The way to track utilization is a bit different, however.

Comments closed

Thoughts on On-Disk Rowstore in SQL Server

Hugo Kornelis starts a series on storage structures:

When a query is slow, it is often caused by inefficient access to the data. So our tuning work very frequently comes down to figuring out how data was read, and then massaging our queries or database structures to get SQL Server to access the data in a more efficient way.

So we look at scans, seeks, and lookups. We know that scans are good when we access most of the data. Or, in the case of an ordered scan, to prevent having to sort the data. We know that seeks are preferred when there is a filter in the query. And we know that lookups represent a good tradeoff between better performance and too many indexes, but only if the filter is highly selective.

All of the above is true. And all of it is highly generalized. And hence, often, not true enough to be actually useful.

Read on for an overview of the most common option.

Comments closed

Indexes and COUNT() in SQL Server

Louis Davidson does some testing:

A few weeks ago, there was a LinkedIn post (I can’t find it anymore) that covered something about how indexes were used by COUNT in SQL. I think it may have been based on SQL Server, but I am not sure (it is rare that one of the SQL posts on LinkedIn mentions a platform). At the time, I went and tried a few of the mentioned cases and realized this was an interesting question: how does the COUNT aggregate use indexes when you use various different expressions.

Louis has a series of test cases and I got most of them right, though I wasn’t sure about one particular optimization.

Comments closed

JSON Data and Columnstore Indexes

Niko Neugebauer continues a series on columnstore:

Not since SQL Server 2008 that Microsoft has added a new base data type to SQL Server, but in SQL Server 2025 they have added not 1 but whole 2 new data types – Vector and JSON. The first one (Vector) and the corresponding index (Vector Index) are described in details in the Columnstore Indexes – part 134 (“Vectors and Columnstore Indexes”) and this post is dedicated to the new JSON data type and the new JSON Index and their compatibility with the Columnstore Indexes and the Batch Execution mode.

One common trait for the Vector & JSON Indexes is that both come with a big number of limitations and they are all enabled under a “Preview” option, making them unsuitable for the most production environments.

Niko has a somewhat-humorous and somewhat-infuriating table at the beginning describing just how much support columnstore indexes have for JSON data types.

And it is another example of the frustrating way in which Microsoft will release something before it’s even half-baked, demand consumer adoption to continue working on it, and then can the feature because people can’t use the not-even-half-baked feature in its current state. There’s a fine line between rapid prototyping and quick market feedback versus strangling products in the crib, and I think they’re pretty far onto the wrong side of things when it comes to most SQL Server functionality.

Comments closed

The Cost of Heaps in SQL Server

Heaps killed Vlad Drumea’s dog and now it’s time to get revenge:

In this post I cover some unpleasant side-effects of using heaps in SQL Server for anything else than staging tables.

In the past couple of months I’ve been running into performance issues caused by the strange decision to have very active tables as heaps.
And, at this point, I’m on a personal vendetta against heaps in SQL Server.
As a result, I’m writing this blog post so that I can refer people to it the next time I’ll have to explain why heaps don’t match their use cases.

I, for one, wholeheartedly support Vlad’s vendetta. There are very rare circumstances in which it makes sense to use a heap in a non-temporary SQL Server table. Otherwise, define that clustered index.

Comments closed

Postgres Index Corruption after an OS Upgrade

Laurenz Albe has me concerned:

Most major Linux distributions have upgraded to the GNU C library version 2.28 or later. Therefore, there is a growing awareness that an operating system upgrade can lead to index corruption. However, rebuilding all your indexes can slow down the upgrade process considerably. In this article, I want to discuss how you can avoid rebuilding more indexes than necessary.

Read the whole thing if you are a Postgres DBA.

Comments closed

Page Compression on Heaps

Vlad Drumea explains why page compression might not give you quite what you expect:

I recently ran into SQL Server’s page compression being applied to a heap, and I figured I’d cover why that won’t work how some folks expect.

SQL Server’s page compression is really neat when applied on tables and indexes that are good candidates for it.
Even more so in cloud environments where storage costs can quickly add up.

Honestly, this is just a good reason to push for clustered indexes on all tables in SQL Server. I’ll call it good reason 5 out of 12.

Comments closed

What’s Missing in Columnstore Indexes

Niko Neugebauer has a list:

After spending some time thinking about the best way to come back to writing about Columnstore Indexes, after 5 and half years hiatus, I came to a conclusion that I have never published a post on what is still missing. With that in mind, I decided to mark my comeback to writing technical posts on my blog with rather simple post on the things that are needed, but did not made into the SQL Server – based engines so far (as of December 2025).

Niko has seven items on his list. I tend not to cover wish lists on Curated SQL, but when it’s Niko and columnstore indexes, I’m willing to make an exception.

Comments closed