Press "Enter" to skip to content

Category: Indexing

GiST Indexes in PostgreSQL

Lee Asher shows off a neat form of indexing:

In Parts I (Solving the Overlap Query Problem in PostgreSQL) and II (Overlapping Ranges in Subsets in PostgreSQL) of this series, we used the GiST index type – and its lesser known cousin: SP-GiST – to turbocharge the performance of overlap queries. But GiST indexes are extremely versatile, with uses far beyond the examples I have used so far.

In this final article, we’ll explore some of the many other ways they can be used.

Click through for use cases, such as multi-dimensional BETWEEN operations (e.g., bid price between X and Y, and ask price between X1 and Y1), date intervals, and more.

Comments closed

Full-Text Search Modes in MySQL

Chad Callihan performs a text search:

When it comes to MySQL queries against columns containing text, FULLTEXT indexes offer a variety of approaches that go beyond your basic pattern matching. When creating a FULLTEXT index for querying text data, we can use MATCH and AGAINST along with our choice of three modes.

Click through for the three modes, as well as examples of queries for each.

Comments closed

Maintaining a Heap Table in SQL Server

Lori Brown performs maintenance:

I have a customer who let me know that some of their tables had a large amount of unused space in them.  They were wondering if I could get them to release the space.  After doing some investigation, I found that all tables with the huge amount of unused space were heap tables.

Read on to see how this is possible. Lori has a bonus script for us as well.

Comments closed

Indexing a Materialized View in PostgreSQL

Elizabeth Christensen adds an index:

Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses, and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed.

One of the things that can be really important for using materialized views efficiently is indexing.

Adding indexes to Postgres in general is critical for operation and query performance. Adding indexes for materialized views is also generally recommended for a few different reasons.

Click through for those reasons. This article illustrates some of the differences between indexed views in SQL Server and materialized views in PostgreSQL.

Comments closed

What to Do when SQL Server Ignores Your Non-Clustered Index

Mehdi Ghapanvari wants us to think of the poor, downtrodden non-clustered indexes:

Do you wonder why SQL Server ignores an index? Do you think about how you can deal with this problem? When you include too many columns in a query you decrease index selectivity. So, if you write a query like “Select * From …” the chance of choosing your nonclustered index will decrease.

The article is generally sound, though not all-inclusive. If you really want to use an index, you can use index hints. I personally have no qualms about using them, though hints are telling the optimizer that you know better than it, so be really sure you do actually know better than it before shipping that off.

Comments closed

The Ephemeral Nature of Index Rebuilds on RCSI and ADR

Brent Ozar lays out an argument:

Accelerated Database Recovery (ADR) is a database-level feature that makes transaction rollbacks nearly instantaneous. Here’s how it works.

Without ADR, when you update a row, SQL Server copies the old values into the transaction log and updates the row in-place. If you roll that transaction back, SQL Server has to fetch the old values from the transaction log, then apply them to the row in-place. The more rows you’ve affected, the longer your transaction will take.

With ADR, SQL Server writes a new version of the row inside the table, leaving the old version in place as well.

Because you’re a smart cookie, you immediately recognize that storing multiple versions of a row inside the same table is going to cause a storage problem: we’re going to be boosting the size of our table, quickly. However, the problem’s even bigger than that, and it starts right from the beginning when we load the data.

This was an interesting analysis, looking at table growth with ADR + RCSI, with ADR or RCSI alone, and with neither feature. Given that I’m all-in on RCSI, this is particularly interesting to me. And if you want to dig really deeply into index maintenance, Jeff Moden has a fantastic set of presentations, which TriPASS recorded in 2021: GUIDs vs Fragmentation and LOB data. These two presentations help provide sound footing for deciding under what circumstances it makes sense to rebuild an index, and noting that (unless you’re Brent), the answer is probably “less often than you’d think.”

Comments closed

Partitioned Tables and Indexes in PostgreSQL

Hettie Dombrovskaya runs into an error:

Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.

So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.

Read on to learn why.

Comments closed

PostgreSQL and Indexing on EXTRACT()

Henrietta Dombrovskaya troubleshoots a performance problem:

It’s Christmas time and relatively quiet in my day job, so let’s make it story time again! One more tale from the trenches: how wrong you can go with one table and one index?

Several weeks ago, a user asked me why one of the queries had an “inconsistent performance.” According to the user, “Sometimes it takes three minutes, sometimes thirty, or just never finishes.” After taking a look at the query, I could tell that the actual problem was not the 30+ minutes, but 3 minutes – when you have a several hundred million row table and your select yields just over a thousand rows, it’s a classical “short query,” so you should be able to get results in milliseconds.

Read on for the problem, as well as how Henrietta was able to coerce the PostgreSQL optimizer into choosing the correct path.

Comments closed

GiST Indexes and Range Queries in PostgreSQL

Lee Asher can’t be limited to a single point:

Our Part I query used the following WHERE clause:

WHERE tsrange(o.start_time, o.end_time) && tsrange(p.enter, p.leave)

The “tsrange()” functions return timestamp ranges. But overlap queries aren’t limited to timestamps; they can be constructed from integers and floating-point values too. Imagine an arbitrage database that tracks the minimum and maximum price paid for a commodity.

Read on for examples of other types of ranges, preventing range intersection, and more.

Comments closed

Indexes in PostgreSQL versus Oracle

Umair Shahid continues a series on migrating from Oracle to PostgreSQL:

For database experts well-versed in Oracle, moving to PostgreSQL opens up new indexing methods that differ significantly in terms of structure, management, and optimization. While both databases leverage indexing to enhance query speed, their approaches vary, particularly in terms of available types, performance tuning, and maintenance. This guide clarifies key differences and provides practical strategies for effectively handling indexes in PostgreSQL.

Read on for the article.

Comments closed