Press "Enter" to skip to content

Category: Indexing

Vector Indexing in PostgreSQL

Hans-Jürgen Schönig builds some indexes on vector data:

In the previous post we have imported a fairly large data set containing Wikipedia data, which we downloaded using pgai. However, importing all this data is not enough because we also need to keep an eye on efficiency. Therefore, it is important to understand that indexing is the key to success.

Click through for an overview of what options are available for indexing vectors in PostgreSQL, as well as the trade-offs and disk space requirements.

Comments closed

Indexing for PostgreSQL in pgNow

Ryan Booz continues a series on pgNow:

In that first article, I shared how pgNow can be a lifesaver when you need immediate performance insights, highlighting features like query tuning and current activity monitoring. The tool’s ability to take periodic snapshots of query activity and spotlight active sessions has already been a significant help for early users.

Today, I wanted to look at another area of information that pgNow can help you explore during times of performance degradation or even as part of a regular database maintenance and hygiene: the Indexing tab.

Click through to see what’s in the feature and to get a free copy of the preview for pgNow.

Comments closed

B-Tree Indexes in PostgreSQL vs SQL Server

Lukas Fittl compares and contrasts:

When it comes to optimizing query performance, indexing is one of the most powerful tools available to database engineers. Both PostgreSQL and Microsoft SQL Server (or Azure SQL) use B-Tree indexes as their default indexing structure, but the way each system implements, maintains, and uses those indexes varies in subtle but important ways.

In this blog post, we explore key areas where PostgreSQL and SQL Server diverge: how their B-Tree indexes implementations behave under the hood and how they store and access data on disk. We’ll also benchmark the impact of deduplication of values on index size in each database system.

I love this kind of post because you hear that SQL Server has indexes and PostgreSQL has indexes (or Oracle has indexes or whatever), and thus, all of your index building knowledge in one applies to the other…right?

One thing that changes the article a bit is that the author doesn’t use page-level compression on indexes in SQL Server. I’d expect the results to change a fair amount, even if the SQL Server non-clustered indexes still ended up larger in the end than PostgreSQL indexes.

Comments closed

GiST Indexes in PostgreSQL

Lee Asher shows off a neat form of indexing:

In Parts I (Solving the Overlap Query Problem in PostgreSQL) and II (Overlapping Ranges in Subsets in PostgreSQL) of this series, we used the GiST index type – and its lesser known cousin: SP-GiST – to turbocharge the performance of overlap queries. But GiST indexes are extremely versatile, with uses far beyond the examples I have used so far.

In this final article, we’ll explore some of the many other ways they can be used.

Click through for use cases, such as multi-dimensional BETWEEN operations (e.g., bid price between X and Y, and ask price between X1 and Y1), date intervals, and more.

Comments closed

Full-Text Search Modes in MySQL

Chad Callihan performs a text search:

When it comes to MySQL queries against columns containing text, FULLTEXT indexes offer a variety of approaches that go beyond your basic pattern matching. When creating a FULLTEXT index for querying text data, we can use MATCH and AGAINST along with our choice of three modes.

Click through for the three modes, as well as examples of queries for each.

Comments closed

Maintaining a Heap Table in SQL Server

Lori Brown performs maintenance:

I have a customer who let me know that some of their tables had a large amount of unused space in them.  They were wondering if I could get them to release the space.  After doing some investigation, I found that all tables with the huge amount of unused space were heap tables.

Read on to see how this is possible. Lori has a bonus script for us as well.

Comments closed

Indexing a Materialized View in PostgreSQL

Elizabeth Christensen adds an index:

Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses, and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed.

One of the things that can be really important for using materialized views efficiently is indexing.

Adding indexes to Postgres in general is critical for operation and query performance. Adding indexes for materialized views is also generally recommended for a few different reasons.

Click through for those reasons. This article illustrates some of the differences between indexed views in SQL Server and materialized views in PostgreSQL.

Comments closed

What to Do when SQL Server Ignores Your Non-Clustered Index

Mehdi Ghapanvari wants us to think of the poor, downtrodden non-clustered indexes:

Do you wonder why SQL Server ignores an index? Do you think about how you can deal with this problem? When you include too many columns in a query you decrease index selectivity. So, if you write a query like “Select * From …” the chance of choosing your nonclustered index will decrease.

The article is generally sound, though not all-inclusive. If you really want to use an index, you can use index hints. I personally have no qualms about using them, though hints are telling the optimizer that you know better than it, so be really sure you do actually know better than it before shipping that off.

Comments closed

The Ephemeral Nature of Index Rebuilds on RCSI and ADR

Brent Ozar lays out an argument:

Accelerated Database Recovery (ADR) is a database-level feature that makes transaction rollbacks nearly instantaneous. Here’s how it works.

Without ADR, when you update a row, SQL Server copies the old values into the transaction log and updates the row in-place. If you roll that transaction back, SQL Server has to fetch the old values from the transaction log, then apply them to the row in-place. The more rows you’ve affected, the longer your transaction will take.

With ADR, SQL Server writes a new version of the row inside the table, leaving the old version in place as well.

Because you’re a smart cookie, you immediately recognize that storing multiple versions of a row inside the same table is going to cause a storage problem: we’re going to be boosting the size of our table, quickly. However, the problem’s even bigger than that, and it starts right from the beginning when we load the data.

This was an interesting analysis, looking at table growth with ADR + RCSI, with ADR or RCSI alone, and with neither feature. Given that I’m all-in on RCSI, this is particularly interesting to me. And if you want to dig really deeply into index maintenance, Jeff Moden has a fantastic set of presentations, which TriPASS recorded in 2021: GUIDs vs Fragmentation and LOB data. These two presentations help provide sound footing for deciding under what circumstances it makes sense to rebuild an index, and noting that (unless you’re Brent), the answer is probably “less often than you’d think.”

Comments closed

Partitioned Tables and Indexes in PostgreSQL

Hettie Dombrovskaya runs into an error:

Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.

So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.

Read on to learn why.

Comments closed