Press "Enter" to skip to content

Category: Indexing

An Analysis of Resumable Online Index Operations

Chris Taylor sums it up:

This is more of a heads up for me / reminder regarding some of the caveats to using ONLINE / RESUMABLE index operations with SQL Server.

Don’t get me wrong, there’s a ton of advantages to using ONLINE and RESUMABLE which I will highlight below but here is the link to the Microsoft Learn page if you want more in depth information

Read on for the pros and cons. I do like the idea, though I personally haven’t used the feature.

Comments closed

Specifying Multiple Indexes in a Table Hint

Michael J. Swart is not satisfied with just one index:

My team wondered if this could be used as to help with a concurrency problem. We recently considered using it to resolve a particular deadlock but we had little success.

It’s useful to think that SQL Server takes locks on index rows instead of table rows. And so the idea we had was that perhaps taking key locks on multiple indexes can help control the order that locks are taken. But after some effort, it didn’t work at avoiding deadlocks. For me, I’ve had better luck using the simpler sp_getapplock.

I now would be curious what the maximum number of such index hints would be so we could apply Swart’s 10% Rule. Though I suppose, no matter the number, this would be a degenerate case of the rule, always leading you down to “one or zero, and stop messing with my database!”

Comments closed

Careful Batching

Michael J. Swart follows up on an older post:

When I wrote Take Care When Scripting Batches, I wanted to guard against a common pitfall when implementing a batching solution (n-squared performance). I suggested a way to be careful. But I knew that my solution was not going to be universally applicable to everyone else’s situation. So I wrote that post with a focus on how to evaluate candidate solutions.

But we developers love recipes for problem solving. I wish it was the case that for whatever kind of problem you got, you just stick the right formula in and problem solved. But unfortunately everyone’s situation is different and the majority of questions I get are of the form “What about my situation?” I’m afraid that without extra details, the best advice remains to do the work to set up the tests and find out for yourself.

Definitely read the original article first. My normal approach is the naive + index method, so I’ll have to try out Michael’s method as well next time I need to delete a chunk of records.

Comments closed

Eager Index Spooling

Chad Callihan has a great analogy for eager spools:

Think of a time when you went to someone for help on a problem. Rather than walk you through how to solve the problem, that someone just groans, speeds through solving it for you, and sends you on your way without any explanation.

Did you learn anything? No.

Will you need help again next time that same problem comes up? You bet.

I also like Erik Darling’s explanation that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index but no, you’re just too busy to create one so I’ll just keep working here all day and do you even call your poor mother anymore?

Comments closed

Blocking during Creation of Indexed Views

Tom Zika runs into a problem:

There was a need to make changes to a table with an Indexed View. Since Indexed Views must be created with SCHEMABINDING, the View must be dropped and recreated.

From past experience, I knew that this operation blocked all queries (Read/Write) that referenced any table from the View’s definition for the duration of the Clustered index creation, even under the RCSI level.

Because the index might be large and the maintenance window small, I want to do that as fast as possible.

Tom does a good job of taking us through the problem as well as the solution. Definitely worth a read if you’re creating indexed views with Enterprise Edition.

Comments closed

Finding SQL Server Indexes with Unused INCLUDE Columns

Dave Mason does some digging:

Periodic index analysis for SQL Server typically involves tasks such as checking for missing/unused/overlapping indexes, checking for heaps that maybe should have been designed with a clustered index, analyzing ROW/PAGE compression, etc. There are numerous DMVs that you can use as a starting point for those tasks. There’s also some good open source tools and scripts that members of the SQL Server community have created and shared. One task that I don’t recall ever seeing was an analysis of INCLUDE columns for nonclustered indexes. What I really wanted to do was to find INCLUDE columns that were never being used, and remove them from index definitions.

Click through to see what Dave learned, as well as the repo containing several useful scripts.

Comments closed

B-Tree and Hash Indexes in Postgres

Paul Randal takes us through indexing in Postgres:

This article explores the PostgreSQL implementation of the B-Tree (the B stands for Balanced) and hash index data structures. As PostgreSQL grows in popularity as an open-source database system for developers and as a target for migrating from Oracle workloads, understanding how PostgreSQL indexes work is extremely important for database developers and administrators. PostgreSQL has several other types of indexes, such as GIN indexes, GiST indexes, and BRIN indexes. I will omit them for this article as they’re somewhat specialty indexes suited for text-based searches, geography, and other complex data types. And while B-Tree index usage makes up roughly 90% of use cases, hash indexes and their concepts are also important to understand.

Understanding and implementing the correct indexes for the workload is the foundation of any well-running relational database system. Adding and adjusting indexes to suit the workload has yielded some of the most significant performance gains over my many years of consulting. However, to add the right indexes, you must first understand them.

Read on to learn more. Even if you live in SQL Server, this is a really good article to read because the types of indexes available can differ radically between platforms and that naturally affects indexing strategy.

Comments closed

Ordered Columnstore Indexes

Joe Obbish and Erik Darling tag team on this one. First, Joe looks at the details of what the CCI ordering process does:

The sort for inserting into an ordered columnstore is a DML request sort. It appears to use the same internal mechanism as the sort that’s added for inserting into partitioned columnstore tables. The difference is that the data is sorted by the specified columns instead of a calculated partition id. In my testing, the sort appears to be a best effort sort that does not spill to tempdb. This means that if SQL Server thinks there won’t be enough memory then the data will not be fully sorted. Parallel inserts have an additional complication. 

And Erik has a messy work-around:

Anyway, I decided to dig in and see what was going on behind the scenes. Which of course, means query plans, and bothering people who are really good at debuggers.

Most of the problems that you’ll run into in SQL Server will come from sorting data.

Whenever I have to think about Sorts, I head to this post about all the different Sorts you might see in a query plan.

Definitely read both posts.

Comments closed

Indexes: Gotta Add ‘Em All

Andy Yun tells a tale of woe:

I once worked for an Entity Framework SaaS company that was having horrific T-SQL performance issues. We put out a major release one weekend and spent the subsequent week buried in horrible performance hell. Because all T-SQL code was generated from Entity Framework (stored procedures were banned by a development exec prior to me joining the company), there were no easy fixes that could be implemented (and a Production release typically took 12 – 36 hours, I wish I was joking).

The manager of infrastructure had heard about SQL Server’s missing index recommendations:

Yep, it ends about as poorly as you’d expect.

Comments closed

Ordered Columnstore Indexes in SQL Server 2022

Brent Ozar appreciates order:

So essentially, every column has a whole bunch of indexes on it.

But there’s no order whatsoever as to which rows end up in which index.

This isn’t a problem for relatively small tables, but as you get to billion-row data warehouse fact tables where columnstore should really shine, performance gradually degrades. In data warehouses, fact tables often have a commonly filtered column, like SaleDate. However, until SQL Server 2022, even if you wanted a small SaleDate range, your query would likely check hundreds or thousands of row groups, each of which had a huge range of data.

But do read the whole thing, as it seems it’s not working correctly in CTP 2.0 of SQL Server 2022. It is quite useful in Azure Synapse Analytics dedicated SQL pools, at least—that I can confirm.

Comments closed