Press "Enter" to skip to content

Category: Indexing

Composite Indexes in MySQL

Lukas Vileikis needs more than one column:

Indexes in MySQL are one of the primary ways to enhance query performance and they are especially useful when the primary use case of our project refers to reading data stored in a database. We‘ve already told you about the nuanced world of indexes in MySQL – and there we‘ve mentioned that MySQL has multiple types of SQL indexes available for you to choose from.

The primary index type available within MySQL is the B-Tree index which we‘ve already covered in one of our previous articles. If you know your way around MySQL though, you will certainly be aware of other nuances of indexes, too – and one of those nuances has to do with the fact that B-Tree indexes can also consist of multiple columns (commonly referred to as composite indexes). In this example, we’re using MariaDB as our DBMS of choice. Percona Server for MySQL and MySQL Server will act identically.

Find the queries recreating the table structure and composite indexes in the appendix, and let’s get started.

Read on to see how composite indexes work in MySQL. On the whole, it’s quite similar to how they work in SQL Server, though it’s interesting to catch the differences at the edges.

Comments closed

Compressing Indexes and Shrinking Azure SQL MI Databases

Kendra Little has a good reason for an often-bad act:

Shrinking databases in SQL Server isn’t fun – it’s slow, it causes blocking if you forget to use the WAIT_AT_LOW_PRIORITY option, and sometimes it persistently fails and refuses to budge until you restart the instance. You only want to shrink a SQL Server database when you’ve got a good reason and a lot of patience.

If you’re using Azure SQL Managed Instance and you haven’t already used data compression on your indexes and shrunk your databases, you probably have two good reasons to do both of those things: performance and cost reduction.

Compressing indexes is very often (almost always?) a good thing. Shrinking databases is very often (again, almost always?) a bad thing. This is like a buddy cop movie for your database.

Kendra gives some good advice but also lays out a warning if you’re on General Purpose V1, so read the whole thing.

Comments closed

Uniquifiers Doing Heavy Lifting

Michael J. Swart is one of a kind:

If you define a clustered index that’s not unique, SQL Server will add a hidden 4-byte column called UNIQUIFIER. You can’t see it directly but it’s there. When you add a row whose key is a duplicate of an existing row, the new row gets a new unique value for it’s uniqueifier. If you add over 2.1 billion rows with the same key, the uniquifier value exceeds the limit and you will see error 666.

A while ago, we nearly got into trouble because of a bad choice for clustering key that went undetected for so long.

Click through for a query to see how many clustered indexes need uniquifiers and which have the most duplication of key fields.

Comments closed

Cost Threshold for Parallelism and Missing Indexes

Jared Westover explains a phenomenon:

Did you know that the Cost Threshold for Parallelism (CTFP) affects SQL Server’s choice of a trivial execution plan? One area that can suffer from this setting is the optimizer’s ability to suggest index recommendations. When SQL Server picks a trivial plan, it skips suggesting any missing indexes. So, if you’ve set a high CTFP and run simple queries, you might never get those handy index recommendations.

Click through to learn more.

Comments closed

Concurrent Index Creation in Postgres

Shayon Mukherjee recommends against a particular technique:

As a developer, you might have encountered situations where creating an index in PostgreSQL fails due to lock timeouts. In such scenarios, it’s tempting to use the IF NOT EXISTS as a quick fix and move on. However, this approach can lead to subtle and hard-to-debug issues in production environments.

Click through to learn more about how concurrent index creation works in Postgres and why the use of IF NOT EXISTS might not work the way you want.

Comments closed

The Internals of DATETIME2

Chad Baldwin digs in:

I noticed in sys.column_store_segments the min_data_id and max_data_id columns store very large bigint values in the segments for datetime2 columns. After doing a bit more googling and tinkering, I found for bit/tinyint/smallint/int/bigint it stores the min/max of the actual values rather than dictionary lookup values. So I assume it’s likely doing the same for date/time/datetime/datetime2 and storing some sort of bigint representation of the actual value.

This post is going to focus on datetime2(7) datatypes mainly because that’s what I was dealing with. Though I’m sure it wouldn’t be much work to figure out the other types.

Click through to learn more about the datatype and see how this wraps into a discussion of temporal table cleanup and columnstore indexes.

Comments closed

Columnstore Key Lookups are Bad News

Forrest McDaniel does not want to perform that key lookup:

I’ve read it repeatedly, columnstore key lookups are extra slow. The question of course, is why?

In my mental model, it makes sense. A normal key lookup adds about 3 reads.

While a columnstore lookup should add at least a read per column, since each column lives in its own segments.

But it turns out that it’s not a read per column, oh no. Columnstore indexes are amazing for large-scale aggregations and awful for individual lookups.

Comments closed

The Challenges of Index Cleanup

Aaron Bertrand talks index maintenance of a different sort:

I’m not the first person to write about cleaning up unused or redundant indexes. You can read many articles about how non-clustered indexes are expensive to store and maintain, that they can occupy valuable space in memory, and that they can negatively impact the performance of the write portion of any SQL Server workload. There are great scripts out there already – like sp_BlitzIndex – that can help identify index issues without any grunt work. So, nothing new in this area.

But reinforcement can be beneficial.

Aaron raises some good points and explains why simply deleting any index with 0 reads is not necessarily a good idea. All bets are off, however, when you see 45 indexes on the same table with the same dta_ prefix.

Comments closed

Keyset Pagination and Descending Order

Laurenz Albe digs into keyset pagination:

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn’t always work. This article explains why and how you can work around that shortcoming.

All of the examples are for Postgres, though in my read-through of the post, I’d say it applies almost as well to SQL Server.

Comments closed

Merge Join vs Hash Join in Postgres

Andrei Lepikhov compares two physical join operators:

Today’s post is sparked by a puzzling observation: users, especially those who use an abstraction layer like REST or ORM library to interact with databases, frequently disable the MergeJoin option across the entire database instance. They justify this action by citing numerous instances of performance degradation.

Considering how many interesting execution paths MergeJoin adds to the system elaborating IncrementalSort or sort orderings derived from underlying IndexScan, it looks strange: one more bug of skewed cost balance inside the PostgreSQL cost model?

This is an interesting peek into how complex the query optimizers in database engines are, as well as how small amounts of information (via statistics or indexes) can matter to a query.

Comments closed