Category: Indexing

Indexing Vector Databases

Published 2024-09-09 by Kevin Feasel

Brendan Tierney continues a series on vector databases:

In this post on Vector Databases, I’ll explore some of the commonly used indexing techniques available in Databases. I’ll also explore the Vector Indexes available in Oracle 23c. Be sure to check that section towards the end of the post, where I’ll also include links to other articles in this series.

As with most data in a Databases, indexes are used for fast access to data. They create an organised structure (typically B+ tree) for storing the location of certain values within a table. When searching for data, if an index exists on that data, the index will be used for matching and the location of the records is used to quickly retrieve the data.

Read on to get an idea of what kinds of indexing techniques are useful in that space.

Comments closed

Index Rebuilds versus Reorganizations

Published 2024-09-05 by Kevin Feasel

Sergey Gigoyan opens a can of worms:

When data is modified in a Microsoft SQL database, the corresponding indexes are also modified. These modifications cause indexes to become fragmented. Fragmentation means that the logical order of the data inside the index pages does not match its physical order. The more the fragmentation level is increased, the more performance is affected for SELECT statements. With fragmented indexes, the information is not spread logically, making the index’s data retrieving operations more time-consuming resulting in query performance issues. Thus, fragmentation of the indexes should be fixed periodically in order to maintain high performance. Rebuild and Reorganize index operations are aimed at defragmenting indexes.

Click through to learn a bit about why we may need index maintenance, followed by a comparison between index rebuilding versus index reorganization. All that said, I do not have room in my indexing philosophy (or my heart) for reorganization. Reorganization simply doesn’t provide enough value as it is. If you want a really detailed dive into the topic, Jeff Moden put together an amazing talk that we were lucky enough to be able to record. Specifically, about an hour and a quarter in (link to timestamp), Jeff gets to the topic of reorganization and how little it actually does in practice.

Comments closed

Computed Columns and Wide Index Updates

Published 2024-08-30 by Kevin Feasel

Paul White takes us through a performance scenario:

Update execution plans are not something the T-SQL statement writer has much control over. You can affect the data reading side of the plan with query rewrites and hints, but there’s not nearly as much tooling available to affect the writing side of the plan.

Update processing can be extremely complex and reading data-changing execution plans correctly can also be difficult. Many important details are hidden away in obscure and poorly documented properties, or simply not present at all.

In this X article, I want to you show a particularly bad update plan example. It has value in and of itself, but it will also give me a chance to describe some less well-known SQL Server details and behaviours.

Read on for the full story.

Comments closed

Composite Indexes in MySQL

Published 2024-08-30 by Kevin Feasel

Lukas Vileikis needs more than one column:

Indexes in MySQL are one of the primary ways to enhance query performance and they are especially useful when the primary use case of our project refers to reading data stored in a database. We‘ve already told you about the nuanced world of indexes in MySQL – and there we‘ve mentioned that MySQL has multiple types of SQL indexes available for you to choose from.

The primary index type available within MySQL is the B-Tree index which we‘ve already covered in one of our previous articles. If you know your way around MySQL though, you will certainly be aware of other nuances of indexes, too – and one of those nuances has to do with the fact that B-Tree indexes can also consist of multiple columns (commonly referred to as composite indexes). In this example, we’re using MariaDB as our DBMS of choice. Percona Server for MySQL and MySQL Server will act identically.

Find the queries recreating the table structure and composite indexes in the appendix, and let’s get started.

Read on to see how composite indexes work in MySQL. On the whole, it’s quite similar to how they work in SQL Server, though it’s interesting to catch the differences at the edges.

Comments closed

Compressing Indexes and Shrinking Azure SQL MI Databases

Published 2024-08-27 by Kevin Feasel

Kendra Little has a good reason for an often-bad act:

Shrinking databases in SQL Server isn’t fun – it’s slow, it causes blocking if you forget to use the WAIT_AT_LOW_PRIORITY option, and sometimes it persistently fails and refuses to budge until you restart the instance. You only want to shrink a SQL Server database when you’ve got a good reason and a lot of patience.

If you’re using Azure SQL Managed Instance and you haven’t already used data compression on your indexes and shrunk your databases, you probably have two good reasons to do both of those things: performance and cost reduction.

Compressing indexes is very often (almost always?) a good thing. Shrinking databases is very often (again, almost always?) a bad thing. This is like a buddy cop movie for your database.

Kendra gives some good advice but also lays out a warning if you’re on General Purpose V1, so read the whole thing.

Comments closed

Uniquifiers Doing Heavy Lifting

Published 2024-08-23 by Kevin Feasel

Michael J. Swart is one of a kind:

If you define a clustered index that’s not unique, SQL Server will add a hidden 4-byte column called UNIQUIFIER. You can’t see it directly but it’s there. When you add a row whose key is a duplicate of an existing row, the new row gets a new unique value for it’s uniqueifier. If you add over 2.1 billion rows with the same key, the uniquifier value exceeds the limit and you will see error 666.

A while ago, we nearly got into trouble because of a bad choice for clustering key that went undetected for so long.

Click through for a query to see how many clustered indexes need uniquifiers and which have the most duplication of key fields.

Comments closed

Cost Threshold for Parallelism and Missing Indexes

Published 2024-08-22 by Kevin Feasel

Jared Westover explains a phenomenon:

Did you know that the Cost Threshold for Parallelism (CTFP) affects SQL Server’s choice of a trivial execution plan? One area that can suffer from this setting is the optimizer’s ability to suggest index recommendations. When SQL Server picks a trivial plan, it skips suggesting any missing indexes. So, if you’ve set a high CTFP and run simple queries, you might never get those handy index recommendations.

Click through to learn more.

Comments closed

Concurrent Index Creation in Postgres

Published 2024-08-13 by Kevin Feasel

Shayon Mukherjee recommends against a particular technique:

As a developer, you might have encountered situations where creating an index in PostgreSQL fails due to lock timeouts. In such scenarios, it’s tempting to use the IF NOT EXISTS as a quick fix and move on. However, this approach can lead to subtle and hard-to-debug issues in production environments.

Click through to learn more about how concurrent index creation works in Postgres and why the use of IF NOT EXISTS might not work the way you want.

Comments closed

The Internals of DATETIME2

Published 2024-08-09 by Kevin Feasel

Chad Baldwin digs in:

I noticed in sys.column_store_segments the min_data_id and max_data_id columns store very large bigint values in the segments for datetime2 columns. After doing a bit more googling and tinkering, I found for bit/tinyint/smallint/int/bigint it stores the min/max of the actual values rather than dictionary lookup values. So I assume it’s likely doing the same for date/time/datetime/datetime2 and storing some sort of bigint representation of the actual value.

This post is going to focus on datetime2(7) datatypes mainly because that’s what I was dealing with. Though I’m sure it wouldn’t be much work to figure out the other types.

Click through to learn more about the datatype and see how this wraps into a discussion of temporal table cleanup and columnstore indexes.

Comments closed

Columnstore Key Lookups are Bad News

Published 2024-08-06 by Kevin Feasel

Forrest McDaniel does not want to perform that key lookup:

I’ve read it repeatedly, columnstore key lookups are extra slow. The question of course, is why?

In my mental model, it makes sense. A normal key lookup adds about 3 reads.

While a columnstore lookup should add at least a read per column, since each column lives in its own segments.

But it turns out that it’s not a read per column, oh no. Columnstore indexes are amazing for large-scale aggregations and awful for individual lookups.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31