Press "Enter" to skip to content

Category: Indexing

Indexing and Window Functions

I continue a series on window functions in SQL Server:

If you’ve been around the block with window functions, you’ve probably heard of the POC indexing strategy: Partition by, Order by, Covering. In other words, with a query, focus on the columns in the PARTITION BY clause (in order!), then the ORDER BY clause (again, in order!), and finally other columns in the SELECT clause to make the index covering (not in order! though it doesn’t hurt!).

But do read on to understand why this is not sufficient.

Comments closed

Why the Optimizer Doesn’t Look at Buffer Pool Data

Paul Randal has an explanation for us:

SQL Server has a cost-based optimizer that uses knowledge about the various tables involved in a query to produce what it decides is the most optimal plan in the time available to it during compilation. This knowledge includes whatever indexes exist and their sizes and whatever column statistics exist. Part of what goes into finding the optimal query plan is trying to minimize the number of physical reads needed during plan execution.

One thing I’ve been asked a few times is why the optimizer doesn’t consider what’s in the SQL Server buffer pool when compiling a query plan, as surely that could make a query execute faster. In this post, I’ll explain why.

This is an interesting post because it explains why the developers of the database engine would purposefully ignore something that could make things faster, but at a potentially devastating cost.

Comments closed

A Heap of Pain

Chad Callihan explains the dislike for heaps in SQL Server:

A table is considered a heap when it is created without a clustered index. Data isn’t in any type of ordered state. Some data is over here, some data is over there.

When you are inserting data into a heap, that data is tossed in wherever. Think of it like your junk drawer. It’s not organized into its own little sections. What do you do when you have something to add such as a pair of scissors or an old pen? You open the drawer, toss it in, and close it up without giving it a second thought.

Like Chad mentions, there are uses for heaps. And when you move to Azure Synapse Analytics, there are more uses for heaps. But with on-premises SQL Server, a heap is usually a mistake.

Comments closed

Index Usage across Replicas

Jess Pomfret does the math:

Last week, I was working on a project to analyse indexes on a database that was part of an availability group. The main goal was to find unused indexes that could be removed, but I was also interested in gaining an overall understanding of how the system was indexed.

Unused indexes not only take up disk space, but they also add overhead to write operations and require maintenance which can add additional load on your system.  We can also use this analysis to look for a high number of lookups which could indicate we need to adjust indexes slightly.

Click through to see how you can connect together index usage stats from the primary and secondary replicas of an availability group.

Comments closed

Fill Factor and When It Matters

Raul Gonzalez has a confession to make:

I love SQL Server internals, I do and I just said it.

Why? because thanks to all the tools, documentation and community members that share their knowledge, folks like me can understand how a super complex piece of software like a relational database engine works (or at least a small part of it).

Click through for a discussion of fill factor and one area where Raul thinks it falls short. I’m not sure that I agree but would need to think about it to give a clear explanation as to why.

Comments closed

Finding Queries with Index Hints

Aaron Bertrand wants to find queries using index hints:

Index hints should be a last resort, but we all know how that tends to go in SQL Server. If you have committed to using index hints, be aware that at some point you may have to update those explicit references. A specific index may get renamed or dropped, making the hints invalid, or a new and better index may be created. How do we find these references to remove, update, or at least document them? I see a lot of queries out there that will search the plan cache for index hints (often using fuzzy matching), or that just happen to use a specific index, but what about queries with explicit index hints that aren’t currently in the plan cache, and is index usage really what we’re after?

Click through for a detailed investigation, and also congratulations to Aaron for landing at Stack Overflow.

Comments closed

The Joys and Pains of Filtered Indexes

Chad Callihan takes us through filtered indexes:

We would all be happy if tables in our databases only contained the data we need. In reality, we tend to have data that needs kept for legal purposes or “just in case” something happens. Are you querying a table where queries only care about a fraction of the data?

When optimizing a query that is written to return a specific set of data, you may improve performance by creating a filtered index. Let’s look at an example of a filtered index and a warning that you may come across when using it.

I have a distinct love-hate relationship with filtered indexes. They’re so useful but so much of that utility comes with significant strings attached.

Comments closed

Index Maintenance and Pipeline Operation Scripts

Kevin Chant has a two-fer for us:

My first personal go-to script is one that has helped me out a lot over the years. Because I have used it a lot to identify missing indexes. I know there are a few different versions available online that you can use. However, I tend to use the one that comes with Glenn Berry’s Diagnostic Queries.

It is so easy to use. I’m not sharing the snippet of code on here because I want to encourage people to download the entire diagnostic script instead. Just download the script that is relevant for your version of SQL Server and search for ‘Missing indexes’.

Read the whole thing.

Comments closed

Tips on using Included Columns on Indexes

Chad Callihan shares some advice:

In my previous blog post, we saw how using INCLUDE to add a column to an index can make a difference compared to a key column. Let’s do a quick overview of INCLUDE and when it should be used.

Included columns are columns that can added to an index as non-key columns. They are only added to the leaf nodes of an index and have a bit more flexibility. Having trouble adding a particular data type to an index? Included columns can be data types unable to be added as key columns. Are you possibly maxed out on index key columns? Use INCLUDE to add any necessary columns.

Read on for an example and note the warning that you shouldn’t just add all of the columns to the INCLUDE clause.

Comments closed

Unique Constraints vs Unique Indexes

Erik Darling calls out unique key constraints:

I do love appropriately applied uniqueness. It can be helpful not just for keeping bad data out, but also help the optimizer reason about how many rows might qualify when you join or filter on that data.

The thing is, I disagree a little bit with how most people set them up, which is by creating a unique constraint.

Data modeling Kevin wants to use unique key constraints because that’s the correct thing to do. Implementation Kevin uses unique nonclustered indexes for the reasons Erik describes. Not mentioned in Erik’s post but potentially relevant is that operations on unique nonclustered indexes can be done online, whereas unique key constraint operations (creation and alteration via drop+create) are offline.

Comments closed