Press "Enter" to skip to content

Category: Indexing

Optimized Column Order for Indexes

Eitan Blumin talks indexing:

SQL Server performance optimization is not a simple topic, and index design plays a pivotal role in it, determining the efficiency of database queries.

One key aspect that often influences performance is the order of columns in an index.

In this guide, I’ll use my real-world experience from our consulting jobs to examine the thinking process behind selecting the best column sequence for an index, the logic behind the decisions, and offer some practical insights for optimal database performance.

Read on to learn more about indexing strategy. And this is probably a good time to remind people that the missing index DMV’s column order has as its sole basis the column’s ordinal value. You can, and often will, do better than the missing index DMV recommendation by going through Eitan’s exercise.

Comments closed

Indexing for Read-Scale Databases

Jose Manuel Jurado Diaz shares a customer case:

Today, I worked on a service request that our customer has a Business Critical database with 4 vCores and Read-Scale Out enabled. Our customer noticed several performance issues using Read-Scale Out database and I would like to explain several lessons learned found during the troubleshooting steps.

Click through for notes on troubleshooting and improving performance.

Comments closed

A Deep Dive into Covering and Non-Covering Indexes

Etienne Lopes goes deeper than I thought:

A few days ago a client brought me a question, he had two tables with the same data, one in each database (same SQL version and compatibility level in different environments with similar infrastructure and configurations). Both tables had an extremely selective column and both had indexes in that column. The thing is that when running a particular query on one Database, let’s call it DB1 here, it was really fast whereas running the same query on the other database (DB2) it was a lot slower. The subset of rows that matched the WHERE clause was the same and it was really small in both cases (since the tables had the same data).

“Ok, let’s have a look at that”, I said, and when doing so, the query running on DB1 was doing a Clustered Index Seek whereas the same query in DB2 was doing a Table Scan. Didn’t I say that both tables had indexes in that column?

Read on for the post. Etienne warns that it’s a big boy of a post, and it covers a lot, including “seek” operations that are scans in disguise, index levels, and more.

Comments closed

Writing Conditional JOIN and WHERE Clauses

Erik Darling has Blondie stuck in my head:

The OR operator is a perfectly valid one to use in SQL statements. If you use an IN clause, there’s a reasonable chance that the optimizer will convert it to a series of OR statements.

For example, IN(1, 2, 3) could end up being = 1 OR = 2 OR = 3 without you doing a darn thing. Optimizers are funny like that. Funny little bunnies.

The problem generally isn’t when asking for IN or OR for a single column, with a list of literal values, the problem is usually when you:

  • Use OR across multiple where clause columns
  • Use OR in a join clause of any variety
  • Use OR to handle NULL parameters or variables

This is an excellent, detailed article and well worth the read.

Comments closed

Transactional Replication Slow: Check Missing Indexes

Lori Brown provides a good tip:

Have you ever had transactional replication mysteriously start showing significant latency at a subscriber server?  If so, check to see if the primary keys from the publisher database are missing on the subscriber database!

Replication was showing long latency while the publisher and subscriber servers were not heavily utilized.  Microsoft generated stored procedures that are used to send INSERT, UPDATE and DELETE operations from the publisher database to the subscriber database had no indexes to help query performance to the subscriber tables.  Since there were no indexes that matched the keys of the statements pushing through data changes, tables were being fully scanned and replication statements were piling up.  In other words, the subscriber was missing indexes. 

Read on to see how you can check this and also how you can quickly script those missing indexes.

Comments closed

SQL Server Index Included Columns and Log Bloat

Forrest McDaniel explains that TANSTAAFL:

Let me share the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Click through for the proof. We often think of included columns on indexes as being nearly free performance gains, but “nearly” is carrying a bit of water here. This shouldn’t dissuade you from using included columns on indexes, but is a valid reason not to include all 50 non-index columns in a table “just in case.”

Comments closed

Parallel Vector Index Rebuild in Postgres

Semab Tariq takes a look at parallel index building in pgvector:

Parallel Index Build refers to the capability to build indexes using parallel processing. In simpler terms, it means that multiple workers or threads can be utilized simultaneously to create an index, which can significantly speed up the index creation process.

When performing an index build operation, PostgreSQL can divide the work among multiple parallel workers, each responsible for building a portion of the index.

Read on to learn more about this bit of functionality in pgvector 0.6 and the performance gains you can get from it.

Comments closed

Indexes in pgvector

Semab Tariq explains index options for Postgres’s vector database:

This blog is part of our pgvector blog series. If you haven’t checked out the first blog, I recommend going through it first, where I dive into important concepts of pgvector and AI applications in detail. I provided a real-world example illustrating how you can perform searches based on the meaning of words rather than the words themselves. You can find it on the link here

In this blog, We will explore additional details about the indexes supported in pgvector. We will discuss how indexes are built in the backend, and the various parameters associated with these indexes, and guide you on selecting the most suitable index based on your requirements. Finally, we will assess which index offers the best recall rate for our search query across our dataset of one million records sourced from Wikipedia. Let’s dive into that

Click through to learn more about the two index types available.

Comments closed

Clustered Index Rebuilds and Transaction Log Space

Chad Callihan talks about a potential problem:

Of all the problems that can cause a database transaction log to fill up, perhaps one lesser-known cause is a large clustered index rebuild. Let’s look at a scenario where this issue could cause a headache.

Click through for the scenario. I will say that I disagree with reorganizing the index under pretty much any circumstances. Jeff Moden, in a long and deep but extremely interesting presentation he gave to us at TriPASS three years ago, laid out why that is. I do agree that rebuilding a clustered index should be uncommon and may be a sign of a data modeling problem.

The other thing to note is that this problem is bigger when you rebuild online and your data drive is the same as your log drive. When you rebuild an index online, SQL Server needs enough space to make a copy of the data, so the rule of thumb is that you’ll have 2x the normal disk space requirements during online rebuild. It’s not necessarily exactly 2x because of fragmentation, compression, and other factors, but that’s why it’s a rule of thumb and not a “guarantee or your money back” type of thing.

Comments closed