Press "Enter" to skip to content

Category: Indexing

Using a Filtered Index to Prevent Duplicate Rows

Jared Westover solves a problem:

Have you ever needed to prevent duplicate rows from popping up in a table, but the built-in unique constraint in SQL Server falls short? I ran into a table with duplicate rows, but we couldn’t delete them, at least for now. Ideally, you would delete all the duplicates and call it a day. If life were this simple, it would be boring. We needed to prevent new ones from showing up and keep the existing ones. The problem with a unique constraint is that it applies to all the rows in a column.

You can also use a filtered index to prevent the problem in the first place. This was actually an interview question I liked to give: with filtered indexes, you can ensure each non-NULL value was unique, but that you could have as many rows with NULL as your dataset required.

Comments closed

Optimized Column Order for Indexes

Eitan Blumin talks indexing:

SQL Server performance optimization is not a simple topic, and index design plays a pivotal role in it, determining the efficiency of database queries.

One key aspect that often influences performance is the order of columns in an index.

In this guide, I’ll use my real-world experience from our consulting jobs to examine the thinking process behind selecting the best column sequence for an index, the logic behind the decisions, and offer some practical insights for optimal database performance.

Read on to learn more about indexing strategy. And this is probably a good time to remind people that the missing index DMV’s column order has as its sole basis the column’s ordinal value. You can, and often will, do better than the missing index DMV recommendation by going through Eitan’s exercise.

Comments closed

Indexing for Read-Scale Databases

Jose Manuel Jurado Diaz shares a customer case:

Today, I worked on a service request that our customer has a Business Critical database with 4 vCores and Read-Scale Out enabled. Our customer noticed several performance issues using Read-Scale Out database and I would like to explain several lessons learned found during the troubleshooting steps.

Click through for notes on troubleshooting and improving performance.

Comments closed

A Deep Dive into Covering and Non-Covering Indexes

Etienne Lopes goes deeper than I thought:

A few days ago a client brought me a question, he had two tables with the same data, one in each database (same SQL version and compatibility level in different environments with similar infrastructure and configurations). Both tables had an extremely selective column and both had indexes in that column. The thing is that when running a particular query on one Database, let’s call it DB1 here, it was really fast whereas running the same query on the other database (DB2) it was a lot slower. The subset of rows that matched the WHERE clause was the same and it was really small in both cases (since the tables had the same data).

“Ok, let’s have a look at that”, I said, and when doing so, the query running on DB1 was doing a Clustered Index Seek whereas the same query in DB2 was doing a Table Scan. Didn’t I say that both tables had indexes in that column?

Read on for the post. Etienne warns that it’s a big boy of a post, and it covers a lot, including “seek” operations that are scans in disguise, index levels, and more.

Comments closed

Writing Conditional JOIN and WHERE Clauses

Erik Darling has Blondie stuck in my head:

The OR operator is a perfectly valid one to use in SQL statements. If you use an IN clause, there’s a reasonable chance that the optimizer will convert it to a series of OR statements.

For example, IN(1, 2, 3) could end up being = 1 OR = 2 OR = 3 without you doing a darn thing. Optimizers are funny like that. Funny little bunnies.

The problem generally isn’t when asking for IN or OR for a single column, with a list of literal values, the problem is usually when you:

  • Use OR across multiple where clause columns
  • Use OR in a join clause of any variety
  • Use OR to handle NULL parameters or variables

This is an excellent, detailed article and well worth the read.

Comments closed

Transactional Replication Slow: Check Missing Indexes

Lori Brown provides a good tip:

Have you ever had transactional replication mysteriously start showing significant latency at a subscriber server?  If so, check to see if the primary keys from the publisher database are missing on the subscriber database!

Replication was showing long latency while the publisher and subscriber servers were not heavily utilized.  Microsoft generated stored procedures that are used to send INSERT, UPDATE and DELETE operations from the publisher database to the subscriber database had no indexes to help query performance to the subscriber tables.  Since there were no indexes that matched the keys of the statements pushing through data changes, tables were being fully scanned and replication statements were piling up.  In other words, the subscriber was missing indexes. 

Read on to see how you can check this and also how you can quickly script those missing indexes.

Comments closed

SQL Server Index Included Columns and Log Bloat

Forrest McDaniel explains that TANSTAAFL:

Let me share the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Click through for the proof. We often think of included columns on indexes as being nearly free performance gains, but “nearly” is carrying a bit of water here. This shouldn’t dissuade you from using included columns on indexes, but is a valid reason not to include all 50 non-index columns in a table “just in case.”

Comments closed

Parallel Vector Index Rebuild in Postgres

Semab Tariq takes a look at parallel index building in pgvector:

Parallel Index Build refers to the capability to build indexes using parallel processing. In simpler terms, it means that multiple workers or threads can be utilized simultaneously to create an index, which can significantly speed up the index creation process.

When performing an index build operation, PostgreSQL can divide the work among multiple parallel workers, each responsible for building a portion of the index.

Read on to learn more about this bit of functionality in pgvector 0.6 and the performance gains you can get from it.

Comments closed