Press "Enter" to skip to content

Category: Indexing

So You Want to Index

Erik Darling has an indexing strategy for querulous normies:

Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.

But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.

None of this is groundbreaking but Erik does a really good job of laying out the order in which you want to consider specific factors.

Comments closed

Index Creation with DROP_EXISTING

Monica Rathbun takes us through the DROP_EXISTING option when modifying an index:

When you are making changes to an existing Non-Clustered index SQL Server provides a wide variety of options. One of the more frequently used methods is DROP EXISTING; in this post you will learn all about that option. This option automatically drops an existing index after recreating it, without the index being explicitly dropped. Let us take a moment understand the behavior of this choice.

What I really want is DROP_IF_EXISTS. I want idempotent commands: if I run it once or a thousand times, I end up in the same state whether there was an index there at the start or not (or if attempt #793 failed due to running out of sort space in tempdb or something, leaving me with no index). DROP_EXISTING is only idempotent if the index already existed, but then you have to ask, why is it important if an index of that name is already there? The important part of the statement is that I want an end state which includes this index in this form.

Comments closed

The Limits of Filtered Indexes

Erik Darling lays out the pros and cons of filtered indexes:

Filtered indexes are really interesting things. Just slap a where clause on your index definition, and you can do all sorts of helpful stuff:

– Isolate hot data
– Make soft delete queries faster
– Get a histogram specific to the span of data you care about

Among other things, of course. There are some annoying things about them though.

– They only work with specific ANSI options
– If you don’t include the filter definition columns in the index, it might not get used
– They only work when queries use literals, not parameters or variables

Click through for examples of them in action. I would definitely like to see improvements to filtered indexes along the lines that Erik mentions. They have so much potential, but are really held back by those limitations.

Comments closed

Deduplicating Missing Index Requests

Erik Darling trains us in a skill:

Going back to our queries and our index requests, all the queries have two things in common:

– They filter on OwnerUserId
– They order by Score

There are of course other elements in the where clause to attend to, but our job is to come up with one index that helps all of our queries.

This gets a lot easier if you know the queries. If all you have is the set of index details, you can make a sane guess but the chance of being wrong is much greater.

Comments closed

PFS Contention and Heaps

Uwe Ricken continues a series on heaps in SQL Server:

The PFS page “can” become a bottleneck for a heap if many data records are entered in the heap in the shortest possible time. How often the PFS page has to be updated depends mostly on the data record’s size to be saved.

This procedure does not apply to clustered indexes since data records in an index must ALWAYS be “sorted” into the data volume according to the defined index value. Therefore, the search for a “free” space is not carried out via the PFS page but via the value of the key attribute!

Read on for more detail.

Comments closed

Pruning Indexes on a Table

John McCormack takes us through an index pruning exercise:

Your SQL Server indexes can’t always be perfect. What I mean by that is you can’t index for every query on a busy SQL server. In this post I’m going to describe just one table from a server I have been tuning recently. It had 26 indexes, and that was not unusual for this database. This most likely happened over time as a result of developers creating an index per query using the table, as well as following some perceived best practices (with the best of intentions).

John got it down from 26 to 11 and shares thoughts on how. If you have that many separate indexes, it most likely stems from a failure in normalization—it’s uncommon that a properly-described table has that many unique access patterns. And that will often lead you to a hard floor which includes more indexes than you’d like without reimagining the table and going through a lot of developer pain.

Comments closed

Tracking Cosmos DB Re-Indexing Progress

Hasan Savran wants information:

Indexes let your queries run faster. When you need to adjust your indexing policies, database engines re-indexes your data respecting to your changes. In Cosmos DB, when you change your indexing policies, database engine truncates all your indexes and starts to reindex all your indexes from scratch. You do not want to change your indexing policies when your application is busy. Because your queries can not use the dropped indexes, queries will take longer, and they will cost more Request Units. Also, your queries might not return all the data they supposed to. You can read me my older post about indexes in Cosmos DB.

     You may want to monitor re-indexing progress; you may want to disable your application until indexing is completed or warn your team about the re-indexing progress. You can check the re-indexing progress only from SDK, that means you need to write your own code to accomplish this. I have the following code which checks the progress every second. If progress is at %100 then it quits, otherwise it continues to check progress every second until it receives 100 as result.

Hasan has provided us with a script, so check that out.

Comments closed

Choosing the Right Index and Partition in Dedicated SQL Pools

Tsuyoshi Matsuzaki gives us some advice on indexing and partitioning data in Azure Synapse Analytics dedicated SQL pools:

Designing index for a table is so primitive and important for better performance.
There’s no “one answer for any case”. You should choose right index for a table depending on the size, usage, query patterns, and cardinality.

In order to help you understand pros/cons in each indexes, I’ll show you each pictures illustrating intuitive structures of indexes available in Synapse Analytics.

Because dedicated SQL pools aren’t the same as the SQL Server box product, it’s important to go in with the understanding that indexing won’t be exactly the same as on-premises or in Azure SQL Database.

Comments closed

Finding Indexes Not in Use

Dennes Torres takes us through a few iterations of a query to find indexes not in use:

It doesn’t matter if you are trying to remove indexes for good reasons or just to work around a bad environment, let’s see in more details how to find which indexes doesn’t have enough usage to justify their existance.

First, some basic definitions, without going into many details:

Index Seek: That’s the best and desirable use of the index. It means the index tree is being used to go directly to the records we need.

Index Scan: Not so good as an index seek, so it could be better. However, sometimes even an index scan is good, a non clustered index scan means the pages of that index are smaller an better for a scan than the pages of the clustered index. There are many variations that makes an index scan good, but most times you don’t need to reach this level of analysis, you may reach your objective only analysing index seeks.

Update: When the fields are updated (update/insert/delete) all indexes which contain those fields need to be updated as well. Indexes are a balance: We increase performance on reading and suffer a bit more when writting. The problem is when the writting happens more than the reading.

Read on to see Dennes’s query evolve and bring important information to the table. For example, it’s not just how often a particular index gets used; it’s also how important the queries are which use this index. An index may only run once a month, but if it turns the most important report the CEO cares about from running in 4 hours to running in 4 seconds, you bet that index is staying.

Comments closed