Press "Enter" to skip to content

Category: Indexing

Indexing in Postgres

Grant Fritchey provides an overview:

In this article I’m going to go over the different types of indexes and some index behaviors. We’ll get into what the indexes are, how they work, and how best you can apply them within your databases. I’m hoping you’ll develop an understanding of which indexes are likely to work better in each situation.

However, I won’t be covering the indexes in detail. Further, I won’t be going very far into how these indexes affects performance. I’m not yet up on performance tuning within PostgreSQL. That article is quite a way down the track from here. When you know which index to use where, you can then drill down to better understand that index, and how best to measure the performance impacts.

This set of indexes is quite different from what we’re used to in SQL Server and much closer to what you get in Oracle.

Comments closed

Against the Grain: Heaps, Clustered Indexes, and Fragmentation

Rob Farley cautions against rash decisions on three topics:

We’ve already mentioned how clustered indexes can help avoid some of the problems with heaps, but I’m rarely in favour of the standard clustered index key choice – that of an ever-increasing identity column. The arguments people make for this are typically around avoiding page-splits, and the smallness of the data type. But I see tables where the clustered index is on a unique ID column and nothing ever refers to it.

Read on for a dose of caution on three popular beliefs.

Comments closed

Finding Unused Indexes on an Instance

Matthew McGiffen goes searching:

The script uses the undocumented sp_MSforeachdb internal stored procedure to iterate through all the databases (excluding the system ones) and store the results of a standard unused indexes script into a temp table. The table can then be queried to analyse the results.

This script comes with a caveat, that the sp_MSforeachdb procedure can be unreliable and on occasion skip databases so you should check that the results include all the databases you are interested in. A simple count of distinct database names, and comparing that against the number of databases on your instance should be sufficient.

That caveat aside (and if you are concerned about it, check out Aaron Bertrand’s sp_foreachdb), click through for the script. I do like that this limits you to just zero-read indexes, as even write-heavy indexes may still be useful.

Comments closed

Avoid Unnecessary Indexes: Postgres Edition

Laetitia Avrot has some advice:

This is why, when I’m called for a performance problem (or for an audit), my first take is to look at the size of the data compared to the size of the indexes. If you store more indexes than data for a transactional workload, that’s bad. The worst I’ve seen was a database with 12 times more indexes stored on disk than data! Of course, it was a transactional workload… Would you buy a cooking book with 10 pages of recipes and 120 pages of indexes at the end of the book?

The problem with indexes is that each time you write (insert, update, delete), you will have to write to the indexes too! That can become very costly in resources and time.

Click through for some Postgres-specific guidance and links to some useful scripts along the way.

Comments closed

Bringing Order to a Columnstore Index

Tibor Karaszi puts columnstore ducks in a row:

Data for a columnstore index is divded in groups of approximate 1 million rows, rowgroups. Each rowgroup has a set of pages for each column. The set of pages for a column in a rowgroup is called a segment. SQL Server has meta-data for the lowest and highest value for a segment. There are no SEEKs in a columnstore index. But, SQL Server can use this meta-data to skip reading segments, with the knowledge that “this segment cannot contain any data that I need based on my predicates in my WHERE clause”.

Also, you might want to do these operations using MAXDOP 1, so we don’t have several threads muddling our neat segment alignment.

I’m not sure I actually set the ORDER BY clause on columnstore indexes all that often—a quick mental survey says maybe once, though that could be my own failing rather than a statement on the utility of ordered columnstore indexes.

Comments closed

Lessons Learned from Index Tuning

Lee Markum has seven lessons for us:

SQL Server indexing basics are critical to query and server performance. Resources, like CPU and disk, are affected by the indexes that you have, or the ones you’re missing.

In the StackOverflow2013 database we’re going to look at Badges and users. Specifically, I want to start by seeing what badges a user has and when that user received them. Some badges, because of the type of badge it is, can be awarded more than once.

Click through for a demonstration.

Comments closed

An Analysis of Resumable Online Index Operations

Chris Taylor sums it up:

This is more of a heads up for me / reminder regarding some of the caveats to using ONLINE / RESUMABLE index operations with SQL Server.

Don’t get me wrong, there’s a ton of advantages to using ONLINE and RESUMABLE which I will highlight below but here is the link to the Microsoft Learn page if you want more in depth information

Read on for the pros and cons. I do like the idea, though I personally haven’t used the feature.

Comments closed

Specifying Multiple Indexes in a Table Hint

Michael J. Swart is not satisfied with just one index:

My team wondered if this could be used as to help with a concurrency problem. We recently considered using it to resolve a particular deadlock but we had little success.

It’s useful to think that SQL Server takes locks on index rows instead of table rows. And so the idea we had was that perhaps taking key locks on multiple indexes can help control the order that locks are taken. But after some effort, it didn’t work at avoiding deadlocks. For me, I’ve had better luck using the simpler sp_getapplock.

I now would be curious what the maximum number of such index hints would be so we could apply Swart’s 10% Rule. Though I suppose, no matter the number, this would be a degenerate case of the rule, always leading you down to “one or zero, and stop messing with my database!”

Comments closed

Careful Batching

Michael J. Swart follows up on an older post:

When I wrote Take Care When Scripting Batches, I wanted to guard against a common pitfall when implementing a batching solution (n-squared performance). I suggested a way to be careful. But I knew that my solution was not going to be universally applicable to everyone else’s situation. So I wrote that post with a focus on how to evaluate candidate solutions.

But we developers love recipes for problem solving. I wish it was the case that for whatever kind of problem you got, you just stick the right formula in and problem solved. But unfortunately everyone’s situation is different and the majority of questions I get are of the form “What about my situation?” I’m afraid that without extra details, the best advice remains to do the work to set up the tests and find out for yourself.

Definitely read the original article first. My normal approach is the naive + index method, so I’ll have to try out Michael’s method as well next time I need to delete a chunk of records.

Comments closed