Press "Enter" to skip to content

Category: Indexing

Cost Threshold for Parallelism and Missing Indexes

Jared Westover explains a phenomenon:

Did you know that the Cost Threshold for Parallelism (CTFP) affects SQL Server’s choice of a trivial execution plan? One area that can suffer from this setting is the optimizer’s ability to suggest index recommendations. When SQL Server picks a trivial plan, it skips suggesting any missing indexes. So, if you’ve set a high CTFP and run simple queries, you might never get those handy index recommendations.

Click through to learn more.

Comments closed

Concurrent Index Creation in Postgres

Shayon Mukherjee recommends against a particular technique:

As a developer, you might have encountered situations where creating an index in PostgreSQL fails due to lock timeouts. In such scenarios, it’s tempting to use the IF NOT EXISTS as a quick fix and move on. However, this approach can lead to subtle and hard-to-debug issues in production environments.

Click through to learn more about how concurrent index creation works in Postgres and why the use of IF NOT EXISTS might not work the way you want.

Comments closed

The Internals of DATETIME2

Chad Baldwin digs in:

I noticed in sys.column_store_segments the min_data_id and max_data_id columns store very large bigint values in the segments for datetime2 columns. After doing a bit more googling and tinkering, I found for bit/tinyint/smallint/int/bigint it stores the min/max of the actual values rather than dictionary lookup values. So I assume it’s likely doing the same for date/time/datetime/datetime2 and storing some sort of bigint representation of the actual value.

This post is going to focus on datetime2(7) datatypes mainly because that’s what I was dealing with. Though I’m sure it wouldn’t be much work to figure out the other types.

Click through to learn more about the datatype and see how this wraps into a discussion of temporal table cleanup and columnstore indexes.

Comments closed

Columnstore Key Lookups are Bad News

Forrest McDaniel does not want to perform that key lookup:

I’ve read it repeatedly, columnstore key lookups are extra slow. The question of course, is why?

In my mental model, it makes sense. A normal key lookup adds about 3 reads.

While a columnstore lookup should add at least a read per column, since each column lives in its own segments.

But it turns out that it’s not a read per column, oh no. Columnstore indexes are amazing for large-scale aggregations and awful for individual lookups.

Comments closed

The Challenges of Index Cleanup

Aaron Bertrand talks index maintenance of a different sort:

I’m not the first person to write about cleaning up unused or redundant indexes. You can read many articles about how non-clustered indexes are expensive to store and maintain, that they can occupy valuable space in memory, and that they can negatively impact the performance of the write portion of any SQL Server workload. There are great scripts out there already – like sp_BlitzIndex – that can help identify index issues without any grunt work. So, nothing new in this area.

But reinforcement can be beneficial.

Aaron raises some good points and explains why simply deleting any index with 0 reads is not necessarily a good idea. All bets are off, however, when you see 45 indexes on the same table with the same dta_ prefix.

Comments closed

Keyset Pagination and Descending Order

Laurenz Albe digs into keyset pagination:

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn’t always work. This article explains why and how you can work around that shortcoming.

All of the examples are for Postgres, though in my read-through of the post, I’d say it applies almost as well to SQL Server.

Comments closed

Merge Join vs Hash Join in Postgres

Andrei Lepikhov compares two physical join operators:

Today’s post is sparked by a puzzling observation: users, especially those who use an abstraction layer like REST or ORM library to interact with databases, frequently disable the MergeJoin option across the entire database instance. They justify this action by citing numerous instances of performance degradation.

Considering how many interesting execution paths MergeJoin adds to the system elaborating IncrementalSort or sort orderings derived from underlying IndexScan, it looks strange: one more bug of skewed cost balance inside the PostgreSQL cost model?

This is an interesting peek into how complex the query optimizers in database engines are, as well as how small amounts of information (via statistics or indexes) can matter to a query.

Comments closed

Indexing for Deletion

Jared Westover needs to delete some rows:

In this article, we’ll examine the importance of indexes in improving DELETE statements. Also, many developers assume that adding a foreign key creates an index, which means they’re typically missing. How can you identify which foreign keys are missing indexes? Don’t worry. A handy DMV helps find them. Starting today, you’ll have the skills to make your DELETE statements faster.

Click through for the full post.

Comments closed

Performance Tuning XML Operations in SQL Server

Ed Pollack does a bit of tuning:

SQL Server provides a variety of ways to tune XML so that it provides consistent performance, consumes less space, all while ensuring efficient access to critical data.

At its core, the metadata-styled XML format runs counter to the data that SQL Server is optimized to manage. Therefore, additional features were added to SQL Server over time that allowed for XML data to be indexed and compressed.

While these features are critical for managing XML data as it becomes large, it is important to remember what XML is intended for and why it is (loosely) structured as it is. Many data professionals have used shortcuts when XML was small, such as storing and analyzing it in string format, only to be forced to reckon with performance challenges when scanning large strings become agonizingly slow.

Read on for the full article.

Comments closed

Parallel Index Builds in SQL Server

Paul White delves into history:

SQL Server doesn’t support parallel modifications to a b-tree index.

That might sound surprising. After all, you can certainly write to the same b-tree index from multiple sessions concurrently. For example, two sessions can happily write alternating odd and even numbers to the same integer b-tree index. So long as both sessions execute on different schedulers and take row locks, there will be no blocking and you’ll get true concurrency.

No, what I mean is: A single session can’t write to a b-tree index using more than one thread. No parallel plan modifications of a b-tree index, in other words. It’s a bit like the lack of parallel backward ordered scans. There’s no reason it couldn’t be implemented, but it hasn’t been so far.

Click through for a link to the full article. Or click the link I just added, your choice.

Comments closed