Press "Enter" to skip to content

Category: Indexing

Thoughts on On-Disk Rowstore in SQL Server

Hugo Kornelis starts a series on storage structures:

When a query is slow, it is often caused by inefficient access to the data. So our tuning work very frequently comes down to figuring out how data was read, and then massaging our queries or database structures to get SQL Server to access the data in a more efficient way.

So we look at scans, seeks, and lookups. We know that scans are good when we access most of the data. Or, in the case of an ordered scan, to prevent having to sort the data. We know that seeks are preferred when there is a filter in the query. And we know that lookups represent a good tradeoff between better performance and too many indexes, but only if the filter is highly selective.

All of the above is true. And all of it is highly generalized. And hence, often, not true enough to be actually useful.

Read on for an overview of the most common option.

Leave a Comment

Indexes and COUNT() in SQL Server

Louis Davidson does some testing:

A few weeks ago, there was a LinkedIn post (I can’t find it anymore) that covered something about how indexes were used by COUNT in SQL. I think it may have been based on SQL Server, but I am not sure (it is rare that one of the SQL posts on LinkedIn mentions a platform). At the time, I went and tried a few of the mentioned cases and realized this was an interesting question: how does the COUNT aggregate use indexes when you use various different expressions.

Louis has a series of test cases and I got most of them right, though I wasn’t sure about one particular optimization.

Leave a Comment

JSON Data and Columnstore Indexes

Niko Neugebauer continues a series on columnstore:

Not since SQL Server 2008 that Microsoft has added a new base data type to SQL Server, but in SQL Server 2025 they have added not 1 but whole 2 new data types – Vector and JSON. The first one (Vector) and the corresponding index (Vector Index) are described in details in the Columnstore Indexes – part 134 (“Vectors and Columnstore Indexes”) and this post is dedicated to the new JSON data type and the new JSON Index and their compatibility with the Columnstore Indexes and the Batch Execution mode.

One common trait for the Vector & JSON Indexes is that both come with a big number of limitations and they are all enabled under a “Preview” option, making them unsuitable for the most production environments.

Niko has a somewhat-humorous and somewhat-infuriating table at the beginning describing just how much support columnstore indexes have for JSON data types.

And it is another example of the frustrating way in which Microsoft will release something before it’s even half-baked, demand consumer adoption to continue working on it, and then can the feature because people can’t use the not-even-half-baked feature in its current state. There’s a fine line between rapid prototyping and quick market feedback versus strangling products in the crib, and I think they’re pretty far onto the wrong side of things when it comes to most SQL Server functionality.

Leave a Comment

The Cost of Heaps in SQL Server

Heaps killed Vlad Drumea’s dog and now it’s time to get revenge:

In this post I cover some unpleasant side-effects of using heaps in SQL Server for anything else than staging tables.

In the past couple of months I’ve been running into performance issues caused by the strange decision to have very active tables as heaps.
And, at this point, I’m on a personal vendetta against heaps in SQL Server.
As a result, I’m writing this blog post so that I can refer people to it the next time I’ll have to explain why heaps don’t match their use cases.

I, for one, wholeheartedly support Vlad’s vendetta. There are very rare circumstances in which it makes sense to use a heap in a non-temporary SQL Server table. Otherwise, define that clustered index.

Leave a Comment

Postgres Index Corruption after an OS Upgrade

Laurenz Albe has me concerned:

Most major Linux distributions have upgraded to the GNU C library version 2.28 or later. Therefore, there is a growing awareness that an operating system upgrade can lead to index corruption. However, rebuilding all your indexes can slow down the upgrade process considerably. In this article, I want to discuss how you can avoid rebuilding more indexes than necessary.

Read the whole thing if you are a Postgres DBA.

Comments closed

Page Compression on Heaps

Vlad Drumea explains why page compression might not give you quite what you expect:

I recently ran into SQL Server’s page compression being applied to a heap, and I figured I’d cover why that won’t work how some folks expect.

SQL Server’s page compression is really neat when applied on tables and indexes that are good candidates for it.
Even more so in cloud environments where storage costs can quickly add up.

Honestly, this is just a good reason to push for clustered indexes on all tables in SQL Server. I’ll call it good reason 5 out of 12.

Comments closed

What’s Missing in Columnstore Indexes

Niko Neugebauer has a list:

After spending some time thinking about the best way to come back to writing about Columnstore Indexes, after 5 and half years hiatus, I came to a conclusion that I have never published a post on what is still missing. With that in mind, I decided to mark my comeback to writing technical posts on my blog with rather simple post on the things that are needed, but did not made into the SQL Server – based engines so far (as of December 2025).

Niko has seven items on his list. I tend not to cover wish lists on Curated SQL, but when it’s Niko and columnstore indexes, I’m willing to make an exception.

Comments closed

Scan Types in PostgreSQL

Elizabeth Christensen has some neat imagery:

The secret to unlocking performance gains often lies not just in what you ask in a query, but in how Postgres finds the answer. The Postgres EXPLAIN system is great for understanding how data is being queried. One of secretes to reading EXPLAIN plans is understanding the type of scan done to retrieve the data. The scan type can be the difference between a lightning-fast response or a slow query.

Click through for the list, as well as images that clearly explain what’s happening.

Comments closed

Dealing with Index Bloat in Postgres

Kendra Little trims down the database after Thanksgiving:

Index bloat in Postgres can cause problems, but it’s easy to miss.

I’ve written about how vacuum problems can prevent PostgreSQL from using covering indexes, and index bloat is one of the things that can make vacuum struggle.

Here’s what you need to know about index bloat, how to find it, and how to fix it.

Read on to learn what Kendra means by index bloat, ways in which it can occur, why this is an issue, and how to identify and correct it.

Comments closed

Parallel Performance and SSMS Outputs

Joe Obbish looks at some execution plans:

Getting back to the query, it doesn’t look that offensive to me. The row mode sort is a parallel top N sort and the overall number of rows to return is low, so each thread can independently sort its rows and return 1000 locally sorted rows to the parent operator. This is about as good as it gets with parallel row mode sorting. This is a row mode only query so the operator times that you see are the sum of that operator’s work and its children. In terms of real work done by the query, the scan clocks in at 1.168 seconds and the sort clocks in at 0.84 seconds. The final accounting at the end by the parent Parallelism (Gather Streams) is misleading at best and an outright LIE at worst. There wasn’t 4 seconds of work done by this query. There was only 2 seconds. 

Joe looks at two separate things in this post: first, a way of trying to optimize OFFSET/FETCH style paging queries; and second, how the gather streams parallel operator can report wrong information.

Comments closed