Press "Enter" to skip to content

Category: Indexing

Missing Indexes Don’t Tell the Whole Story

Erik Darling explains some of the shortcomings of the missing indexes DMV:

The problem with relying on any data point is that when it’s not there, it can look like there’s nothing to see.

Missing indexes requests are one of those data points. Even though there are many reasons why they might not be there, sometimes it’s not terribly clear why one might not surface.

That can be annoying if you’re trying to do a general round of tuning on a server, because you can miss some easy opportunities to make improvements.

Read on for a few examples of where the results can betray you.

Comments closed

Learning About Index Utilization with dbatools

Ben Miller takes us through a way to know your data:

You have many tables in your databases and you want to know how they are used. There are DMVs for index usage stats which will tell you about like sys.dm_db_index_usage_stats and querying them is insightful, but how do the stats change over time? These stats are reset when the instance is restarted and it is good to know that you have 2000 seeks and 500 scans of the index, but when did they happen? Was it on a common day? Common hour?

Ben has a way to help you figure that out.

Comments closed

Automating Columnstore Index Partition Rebuilds

Brett Powell has a procedure for us:

This post provides an example of a stored procedure which A) identifies the last two partitions of an Azure Synapse Analytics SQL pool table (which uses the columnstore index (default)) and B) rebuilds the index for these two partitions. Additionally, a sample PowerShell-based Azure Automation runbook is included for scheduling the execution of this procedure.

This post follows up on the previous post regarding a Power BI template to be used to analyze the health or quality of a columnstore index. For example, the template shared may help you find that the last one or two partitions such as partition numbers 39 and 40 out of 40 partitions may have many open (uncompressed) and/or not-optimized rowgroups. The cause of these low quality partitions could be that recent and ongoing data processing events are impacting these partitions (inserts,updates). Perhaps partitions 39 and 40 refer to the current and prior month for example.

Read on for the link to the script, as well as details on how to use it.

Comments closed

Key Lookups and Self-Joins

Erik Darling has an interesting method for eliminating key lookups:

This post isn’t going to go terribly deep into anything, but I do want to make a few things about them more clear, because I don’t usually see them mentioned anywhere.

1. Lookups are joins between two indexes on the same table
2. Lookups can only be done via nested loops joins
3. Lookups can’t be moved around in the execution plan

I don’t want you to think that every lookup is bad and needs to be fixed, but I do want you to understand some of the limitations around optimizing them.

Definitely worth the read.

Comments closed

Optimizing Read Performance of Heaps

Uwe Ricken continues a series on heaps in SQL Server:

Heaps are not necessarily the developer’s favourite child, as they are not very performant, especially when it comes to selecting data (most people think so!). Certainly, there is something true about this opinion, but in the end, it is always the workload that decides it. In this article, I describe how a Heap works when data are selected. If you understand the process in SQL Server when reading data from a Heap, you can easily decide if a Heap is the best solution for your workload.

Uwe hits on a couple of the (few) use cases where heap performance can match and sometimes surpass clustered index performance.

Comments closed

Finding Index Usage Stats in Query Store

Grant Fritchey gives us another option for determining whether an index is in use:

One of the most frequent questions you’ll hear online is how to determine if a particular index is in use. There is no perfect answer to this question. You can look at the sys.dm_db_index_usage_stats to get a pretty good picture of whether or not an index is in use. However, this DMV has a few holes through which you could be mislead.

I thought of another way to get an idea of how and where an index is being used. This is also a flawed solution, but, still, an interesting one.

What if we queried the information in Query Store?

Be sure to read Grant’s warning before jumping into this, but at least it gives us another option, as well as a better understanding of which queries are using particular indexes.

Comments closed

Using Computed Columns to Avoid Scans without Changing Queries

Andy Mallon shares a trick you don’t want to use too often, but can get you out of a pinch:

We’ve all been there. You’ve got a query where the JOIN or WHERE predicate is not SARGable. You’ve read about how this can be a problem, and how bad it is for performance.

Alas, you cannot change the query. Sometimes this reason is political, sometimes it’s because you’ve got a third-party app and simply don’t have access to the code. But you do have access to the database…

This is the type of thing you learn about and use maybe twice in your career, and then you get frustrated with the third-party vendor which won’t fix their code.

Comments closed

Removing Redundant Indexes

Guy Glantser helps find and remove redundant indexes in SQL Server:

For some reason, which I have never understood, SQL Server allows you to create duplicate indexes on the same object (table or view). You can create as many non-clustered indexes as you like with the exact index keys and included columns as well as the exact index properties. The only difference between the indexes would be the index ID and the index name. This is a very undesirable situation, because there is clearly no benefit from having the same index more than once, but on the other hand there is quite a lot of overhead that each index incurs. The overhead includes the storage consumed by the index, the on-going maintenance during DML statements, the periodical rebuild and/or reorganize operations, and the additional complexity that the optimizer has to deal with when evaluating possible access methods to the relevant table or view.

I don’t fully agree with Guy’s definition of redundancy, but it’s more a quibble than anything else—if I have an index on (A,B,C) and an index on (A,B), it might seem redundant, but there are cases when it isn’t. For example, suppose C is a large NVARCHAR column such that we barely fit (A,B,C) into the window for an index (1700 bytes in SQL Server 2016, 900 in prior versions), but A and B are INT types. If we have a lot of cases where we need (A,B) but not C, that second index is definitely not redundant.

Regardless, click through for Guy’s argument and a script to help you find potentially redundant indexes.

Comments closed

Filtered Indexes and the Optimizer

Paul White covers a couple of issues around filtered indexes:

This is a many-to-many merge join, where the execution engine must keep track of duplicates from the outer input in a worktable, and rewind as necessary. Duplicates? We are scanning a unique index!

It turns out the optimizer does not know that a filtered unique index produces unique values. This is a one-to-one join, but the optimizer costs it as if it were many-to-many. The higher estimated cost of a many-to-many merge join explains why a hash join plan is chosen.

Read the whole thing.

Comments closed

When a Non-Clustered Index on Clustered Columns Makes Sense

Allen White gives us a scenario where adding a non-clustered index which is the same column as the clustered index can make sense:

Recently I was asked about adding a non-clustered index to a table (let’s call it Images) with just one column. It had been added in the development database and it improved performance dramatically. I looked at it and it had the same key as the clustered index on that table.

In reviewing the query I saw that Images was joined to the other tables in the query, but none of the columns were used, so Images was joined to ensure that values from the other tables had rows in Images. The query plan shows a significantly higher number of reads against Images without the new NCI (non-clustered index) than when it’s present.

I do agree that this can help—as we obviously see. The backseat query tuner in me wonders if maybe there’s another way to write the query to prevent the scan by using CROSS APPLY, but that’d only help if they were getting a small percentage of rows from the parent table expression built from the combination of the clustered index scan and index seek in the second example.

Comments closed