Indexing – Curated SQL

Writing Conditional JOIN and WHERE Clauses

Published 2024-04-26 by Kevin Feasel

Erik Darling has Blondie stuck in my head:

The OR operator is a perfectly valid one to use in SQL statements. If you use an IN clause, there’s a reasonable chance that the optimizer will convert it to a series of OR statements.
For example, IN(1, 2, 3) could end up being = 1 OR = 2 OR = 3 without you doing a darn thing. Optimizers are funny like that. Funny little bunnies.
The problem generally isn’t when asking for IN or OR for a single column, with a list of literal values, the problem is usually when you:
Use OR across multiple where clause columns
Use OR in a join clause of any variety
Use OR to handle NULL parameters or variables

This is an excellent, detailed article and well worth the read.

Transactional Replication Slow: Check Missing Indexes

Published 2024-04-12 by Kevin Feasel

Lori Brown provides a good tip:

Have you ever had transactional replication mysteriously start showing significant latency at a subscriber server? If so, check to see if the primary keys from the publisher database are missing on the subscriber database!
Replication was showing long latency while the publisher and subscriber servers were not heavily utilized. Microsoft generated stored procedures that are used to send INSERT, UPDATE and DELETE operations from the publisher database to the subscriber database had no indexes to help query performance to the subscriber tables. Since there were no indexes that matched the keys of the statements pushing through data changes, tables were being fully scanned and replication statements were piling up. In other words, the subscriber was missing indexes.

Read on to see how you can check this and also how you can quickly script those missing indexes.

SQL Server Index Included Columns and Log Bloat

Published 2024-04-09 by Kevin Feasel

Forrest McDaniel explains that TANSTAAFL:

Let me share the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Click through for the proof. We often think of included columns on indexes as being nearly free performance gains, but “nearly” is carrying a bit of water here. This shouldn’t dissuade you from using included columns on indexes, but is a valid reason not to include all 50 non-index columns in a table “just in case.”

Parallel Vector Index Rebuild in Postgres

Published 2024-03-29 by Kevin Feasel

Semab Tariq takes a look at parallel index building in pgvector:

Parallel Index Build refers to the capability to build indexes using parallel processing. In simpler terms, it means that multiple workers or threads can be utilized simultaneously to create an index, which can significantly speed up the index creation process.
When performing an index build operation, PostgreSQL can divide the work among multiple parallel workers, each responsible for building a portion of the index.

Read on to learn more about this bit of functionality in pgvector 0.6 and the performance gains you can get from it.

Comments closed

Indexes in pgvector

Published 2024-02-27 by Kevin Feasel

Semab Tariq explains index options for Postgres’s vector database:

This blog is part of our pgvector blog series. If you haven’t checked out the first blog, I recommend going through it first, where I dive into important concepts of pgvector and AI applications in detail. I provided a real-world example illustrating how you can perform searches based on the meaning of words rather than the words themselves. You can find it on the link here
In this blog, We will explore additional details about the indexes supported in pgvector. We will discuss how indexes are built in the backend, and the various parameters associated with these indexes, and guide you on selecting the most suitable index based on your requirements. Finally, we will assess which index offers the best recall rate for our search query across our dataset of one million records sourced from Wikipedia. Let’s dive into that

Click through to learn more about the two index types available.

Comments closed

Clustered Index Rebuilds and Transaction Log Space

Published 2024-02-23 by Kevin Feasel

Chad Callihan talks about a potential problem:

Of all the problems that can cause a database transaction log to fill up, perhaps one lesser-known cause is a large clustered index rebuild. Let’s look at a scenario where this issue could cause a headache.

Click through for the scenario. I will say that I disagree with reorganizing the index under pretty much any circumstances. Jeff Moden, in a long and deep but extremely interesting presentation he gave to us at TriPASS three years ago, laid out why that is. I do agree that rebuilding a clustered index should be uncommon and may be a sign of a data modeling problem.

The other thing to note is that this problem is bigger when you rebuild online and your data drive is the same as your log drive. When you rebuild an index online, SQL Server needs enough space to make a copy of the data, so the rule of thumb is that you’ll have 2x the normal disk space requirements during online rebuild. It’s not necessarily exactly 2x because of fragmentation, compression, and other factors, but that’s why it’s a rule of thumb and not a “guarantee or your money back” type of thing.

Comments closed

Search Patterns in T-SQL

Published 2024-02-23 by Kevin Feasel

Erik Darling puts on the fedora and grabs the bullwhip:

First, what you should not do: A universal search string:
The problem here is somewhat obvious if you’ve been hanging around SQL Server long enough. Double wildcard searches, searching with a string type against numbers and dates, strung-together OR predicates that the optimizer will hate you for.
These aren’t problems that other things will solve either. For example, using CHARINDEX or PATINDEX isn’t a better pattern for double wildcard LIKE searching, and different takes on how you handle parameters being NULL don’t buy you much.

Read on for an example of a terrible search query, a mediocre search query, a good search query, and a possible unicorn: an actually valid reason to use a non-clustered columnstore index.

Comments closed

Indexing for Window Functions

Published 2024-02-21 by Kevin Feasel

Erik Darling talks window functions:

A lot of the time, the answer to performance issues with ranking windowing functions is simply to get Batch Mode involved. Where that’s not possible, you may have to resort to adding indexes.
Sometimes, even with Batch Mode, there is additional work to be done, but it really does get a lot of the job done.
In this post I’m going to cover some of the complexities of indexing for ranking windowing functions when there are additional considerations for indexing, like join and where clause predicates.

Click through for an in-depth article with plenty of good information.

Comments closed

Indexing for Sorted Data

Published 2024-02-19 by Kevin Feasel

Erik Darling digs into indexes and sorts:

Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.
The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.
Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.

Click through for an interesting problem and solution.

Comments closed

Weirdness with Aggregation

Published 2024-02-15 by Kevin Feasel

Erik Darling digs into a problem. Part 1 sets up the scenario:

Here’s the query plan, which yes, you’re reading correctly, runs for ~23 seconds, fully joining both tables prior to doing the final aggregation.
I’m showing you a little extra here, because there are missing index requests that the optimizer asks for, but we’ll talk about those in tomorrow’s post.
The wait stats for this query, since it’s running in Batch Mode, are predictably HT-related.

Part 2 covers those missing indexes:

I’ve taken a small bit of artistic license with them.
The crappy thing is… They really do not help and in some cases things get substantially worse.

Maybe it’s because it’s early and I’m trying to compile things in my head rather than actually trying it out, but it seems like a combo of CTE + CROSS APPLY or a pair of CROSS APPLY statements could work better (especially with a good index), assuming that join doesn’t need to be in place. Given the query as it is, with two MAX() aggregations and no GROUP BY clause, that could be an avenue for improvement, though one I have not actually tested. Nonetheless, read both of Erik’s posts.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Indexing