Press "Enter" to skip to content

Category: Indexing

Eager Spools in SQL Server

Erik Darling describes the concept of eager spooling:

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

I enjoy and frequently use Erik’s depiction that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index.

Comments closed

Disabled Indexes Tell No Compression Tales

Eric Cobb gives us a warning around disabling indexes:

Here at work we have a very large, very intensive data load that disables and rebuilds indexes as part of the process. We recently added compression to many of the tables and indexes in the database because it was growing quite large (around 28TB at the time). After adding compression, we got the database size down to somewhere around 17TB.

So you can imagine our surprise when the DB size jumped back up to over 30TB after the last data load! In trying to figure out what happened I discovered that most of the data compression was gone.

That’s…not great. Eric shows us a demo as well and notes that it still applies to SQL Server 2019. I’d be apt to call it a bug, myself.

Comments closed

Finding Heaps in a Database

David Fowler has a quick one for us:

This is just going to be a quick post but I was asked the question the other day, how can I find all the tables in the database that don’t have a clustered index?

Tables without clustered indexes, known as heaps can give you quite a few different performance problems for various reasons. I don’t really want to go into those here, that could be a subject for another post but finding these heaps is actually very simple.

And David has a short script to show just how simple it is.

Comments closed

Dropping Unused Indexes in Azure SQL DB

Monica Rathbun gives an important lesson around tracking index utilization in Azure SQL Database:

If the index has not shown any utilization I investigate to determine if it is one that can be removed. However, this week something caught my attention. I was looking at a client’s indexes and noted the values for these were not as high as I would have expected. I know that these index statistics are reset upon every SQL Server Service restart, but in this case, I was working on an Azure SQL Database. which got me wondering exactly how that worked. With an Azure Virtual Machine or an on Prem SQL Server instance this is easy to figure out. But with an Azure SQL Database we do not have control over when restarts are done, and what about the Serverless offering (which pauses unutilized databases to reduce costs), how do those behave?  I really want to make sure before I remove any indexes from a database that I am examining the best data possible to make that decision. So, I did some digging.

Read on to see what Monica discovered.

Comments closed

Columnstore Index Maintenance

Ed Pollack continues a series on columnstore indexes:

Like with standard B-tree indexes, a columnstore index may be the target of a rebuild or reorganize operation. The similarities end here, as the function of each is significantly different and worth considering carefully prior to using either.

There are two challenges addressed by columnstore index maintenance:

1. Residual open rowgroups or open deltastores after write operations complete.
2. An abundance of undersized rowgroups that accumulate over time

Read on for the full story.

Comments closed

Making a Heap Molehill out of a Heap Mountain

Bert Wagner needed to move a large subset of a heap into its own heap:

Recently I had to filter out 1.2 billion records from a 3.5 billion row heap. Don’t ask me why this 3.5 billion row table is stored as a heap.

If the lack of a clustered index wasn’t bad enough, I also had some other restrictions:

I couldn’t add a clustered index (or any index for that matter) sorted on the key I needed to filter on. It wasn’t my system, and and I needed to access the 1.2 billion records sooner than it would take to get a clustered index approved and added. Even then, maybe the lack of a clustered index on this table is a feature and my request would be denied. Who knows.
I didn’t have a server that could store all of the data. At first I thought of copying all 3.5 billion rows to my own server and indexing it how I needed, but I didn’t have enough storage space anywhere to do that.
My connection to the server had a relatively short timeout set on it. This also couldn’t be changed. If I couldn’t copy all 3.5 billion rows because of storage, I also couldn’t copy all 1.2 billion records in one fell swoop because the connection would timeout.

I’m also glad to see that Bert is back in action and look forward to seeing these posts and videos pop up again.

Comments closed

Missing Indexes Don’t Tell the Whole Story

Erik Darling explains some of the shortcomings of the missing indexes DMV:

The problem with relying on any data point is that when it’s not there, it can look like there’s nothing to see.

Missing indexes requests are one of those data points. Even though there are many reasons why they might not be there, sometimes it’s not terribly clear why one might not surface.

That can be annoying if you’re trying to do a general round of tuning on a server, because you can miss some easy opportunities to make improvements.

Read on for a few examples of where the results can betray you.

Comments closed

Learning About Index Utilization with dbatools

Ben Miller takes us through a way to know your data:

You have many tables in your databases and you want to know how they are used. There are DMVs for index usage stats which will tell you about like sys.dm_db_index_usage_stats and querying them is insightful, but how do the stats change over time? These stats are reset when the instance is restarted and it is good to know that you have 2000 seeks and 500 scans of the index, but when did they happen? Was it on a common day? Common hour?

Ben has a way to help you figure that out.

Comments closed

Automating Columnstore Index Partition Rebuilds

Brett Powell has a procedure for us:

This post provides an example of a stored procedure which A) identifies the last two partitions of an Azure Synapse Analytics SQL pool table (which uses the columnstore index (default)) and B) rebuilds the index for these two partitions. Additionally, a sample PowerShell-based Azure Automation runbook is included for scheduling the execution of this procedure.

This post follows up on the previous post regarding a Power BI template to be used to analyze the health or quality of a columnstore index. For example, the template shared may help you find that the last one or two partitions such as partition numbers 39 and 40 out of 40 partitions may have many open (uncompressed) and/or not-optimized rowgroups. The cause of these low quality partitions could be that recent and ongoing data processing events are impacting these partitions (inserts,updates). Perhaps partitions 39 and 40 refer to the current and prior month for example.

Read on for the link to the script, as well as details on how to use it.

Comments closed

Key Lookups and Self-Joins

Erik Darling has an interesting method for eliminating key lookups:

This post isn’t going to go terribly deep into anything, but I do want to make a few things about them more clear, because I don’t usually see them mentioned anywhere.

1. Lookups are joins between two indexes on the same table
2. Lookups can only be done via nested loops joins
3. Lookups can’t be moved around in the execution plan

I don’t want you to think that every lookup is bad and needs to be fixed, but I do want you to understand some of the limitations around optimizing them.

Definitely worth the read.

Comments closed