Press "Enter" to skip to content

Category: Indexing

Automatic Index Compaction

Rebecca Lewis looks at a new Azure SQL Database preview feature:

Microsoft’s announcement of Automatic Index Compaction is titled ‘Stop defragmenting and start living’. That is not an accident. Brent Ozar has been making the case for years that defragmenting indexes is largely maintenance theater — that external fragmentation barely matters on modern SSDs and shared storage and that nightly rebuild jobs hammer your transaction log and I/O for gains that are difficult to measure.

His sessions on the topic have been circulating for over a decade, and now Microsoft’s own documentation states it plainly: ‘For most workloads, a higher index fragmentation doesn’t affect query performance or resource consumption.’ I believe that may be Brent’s argument almost verbatim in their official docs.

This could be interesting.

By the way, if you want a really deep dive on index maintenance, I’ll point back to a pair of sessions Jeff Moden did for TriPASS (the Triangle Area SQL Server User Group that I run) about 5 years back and was gracious enough to let us record. They are very long user group sessions but go into detail on exactly what kinds of index write patterns benefit from rebuilds and which ones don’t, as well as a lot more.

Leave a Comment

LOB Logical Reads and Columnstore Indexes

Brent Ozar notices a difference:

Forever now, FOREVER, it’s been a standard thing where I can say, “When you’re measuring storage performance during index and query tuning, you should always use logical reads, not physical reads, because logical reads are repeatable, and physical reads aren’t. Physical reads can change based on what’s in cache, what other queries are running at the time, your SQL Server edition, and whether you’re getting read-ahead reads. Logical reads just reflect exactly the number of pages read, no matter where the data came from (storage or cache), so as long as that number goes down, you’re doing a good job.”

But this is not always the case, as Brent demonstrates.

Comments closed

The State of Vector Indexes in SQL Server 2025

Rebecca Lewis separates marketing hype from reality:

Microsoft’s entire marketing pitch for SQL Server 2025 is ‘the AI-ready database.’ It went GA on November 18, 2025. We are now four months in. Here is what is actually GA, what is still behind a preview flag, and what that means if you are evaluating this for production.

Read on for a list, as well as a summary of Erik Darling’s great work on the topic.

My take on this is that vector indexes are where columnstore indexes were in SQL Server 2012: a neat idea, but not ready for prime time. It took until 2016 before columnstore indexes were actually worthwhile (primarily, the introduction of clustered columnstore indexes and ability to rebuild indexes), so we’ll see if it takes as long for vector indexes to get all of the necessary functionality.

1 Comment

Read Efficiency in PostgreSQL Queries

Michael Christofides explains what’s happening under the covers:

A lot of the time in database land, our queries are I/O constrained. As such, performance work often involves reducing the number of page reads. Indexes are a prime example, but they don’t solve every issue (a couple of which we’ll now explore).

The way Postgres handles consistency while serving concurrent queries is by maintaining multiple row versions in both the main part of a table (the “heap”) as well as in the indexes (docs). Old row versions take up space, at least until they are no longer needed, and the space can be reused. This extra space is commonly referred to as “bloat”. Below we’ll look into both heap bloat and index bloat, how they can affect query performance, and what you can do to both prevent and respond to issues.

Read on for a detailed explanation.

Comments closed

Occasional Query Failures on a Small Table

Paul Randal troubleshoots an issue:

The table in question only had a few million rows of data in it, with a maximum row size of 60 bytes, and the query usually ran in a few seconds, but occasionally the query would ‘hang’ and would either be killed or take tens of minutes to run. Troubleshooting instrumentation when the issue happened showed no out-of-the-ordinary waits occurring, no pressure on the server, and the query plan generated when the query took a long time was essentially the same.

The only thing noticeable was that when the problem occurred, a column statistics update happened as part of query compilation, but with such a tiny table, how could that be the root cause of the issue? The calculated disk space for the row size and count worked out to be about 250MB, but with a statistics sample rate of only 4%, extended events showed an auto_stats event taking close to an hour!

Read on to learn the cause. I will admit that I did not get this one correct when I guessed what the cause could be.

Comments closed

Tracking Unused Indexes in PostgreSQL

Semab Tariq wants to see which indexes are in use:

Indexes exist to speed up data access. They allow PostgreSQL to avoid full table scans, significantly reducing query execution time for read-heavy workloads.

From real production experience, we have observed that well-designed, targeted indexes can improve query performance by 5× or more, especially on large transactional tables.

However, indexes are not free.

The reasons for why are very similar to what we have in SQL Server. The way to track utilization is a bit different, however.

Comments closed

Thoughts on On-Disk Rowstore in SQL Server

Hugo Kornelis starts a series on storage structures:

When a query is slow, it is often caused by inefficient access to the data. So our tuning work very frequently comes down to figuring out how data was read, and then massaging our queries or database structures to get SQL Server to access the data in a more efficient way.

So we look at scans, seeks, and lookups. We know that scans are good when we access most of the data. Or, in the case of an ordered scan, to prevent having to sort the data. We know that seeks are preferred when there is a filter in the query. And we know that lookups represent a good tradeoff between better performance and too many indexes, but only if the filter is highly selective.

All of the above is true. And all of it is highly generalized. And hence, often, not true enough to be actually useful.

Read on for an overview of the most common option.

Comments closed

Indexes and COUNT() in SQL Server

Louis Davidson does some testing:

A few weeks ago, there was a LinkedIn post (I can’t find it anymore) that covered something about how indexes were used by COUNT in SQL. I think it may have been based on SQL Server, but I am not sure (it is rare that one of the SQL posts on LinkedIn mentions a platform). At the time, I went and tried a few of the mentioned cases and realized this was an interesting question: how does the COUNT aggregate use indexes when you use various different expressions.

Louis has a series of test cases and I got most of them right, though I wasn’t sure about one particular optimization.

Comments closed

JSON Data and Columnstore Indexes

Niko Neugebauer continues a series on columnstore:

Not since SQL Server 2008 that Microsoft has added a new base data type to SQL Server, but in SQL Server 2025 they have added not 1 but whole 2 new data types – Vector and JSON. The first one (Vector) and the corresponding index (Vector Index) are described in details in the Columnstore Indexes – part 134 (“Vectors and Columnstore Indexes”) and this post is dedicated to the new JSON data type and the new JSON Index and their compatibility with the Columnstore Indexes and the Batch Execution mode.

One common trait for the Vector & JSON Indexes is that both come with a big number of limitations and they are all enabled under a “Preview” option, making them unsuitable for the most production environments.

Niko has a somewhat-humorous and somewhat-infuriating table at the beginning describing just how much support columnstore indexes have for JSON data types.

And it is another example of the frustrating way in which Microsoft will release something before it’s even half-baked, demand consumer adoption to continue working on it, and then can the feature because people can’t use the not-even-half-baked feature in its current state. There’s a fine line between rapid prototyping and quick market feedback versus strangling products in the crib, and I think they’re pretty far onto the wrong side of things when it comes to most SQL Server functionality.

Comments closed