Press "Enter" to skip to content

Category: Query Tuning

Diving Into the Window Spool Operator

Hugo Kornelis continues a series on execution plan operators:

The Window Spool operator is one of the four spool operators that SQL Server supports. Like other spool operators, it retains a copy of data it receives and can then return those rows as often as needed. The specific functionality of the Window Spool operator allows it to replay rows within a window, as defined in a ROWS or RANGE specification of an OVER clause.

Read on to see how these work, as well as a few differences from their spool brethren.

Comments closed

Making Use of Sort Rewinds: Closest Match

Paul White follows up on an article:

In When Do SQL Server Sorts Rewind? I described how most sorts can only rewind when they contain at most one row. The exception is in-memory sorts, which can rewind at most 500 rows and 16KB of data.

These are certainly tight restrictions, but we can still make use of them on occasion.

To illustrate, I am going reuse a demo Itzik Ben-Gan provided in part one of his Closest Match series, specifically solution 2 (modified value range and indexing).

Click through for the explanation.

Comments closed

When SQL Server Sorts Can Rewind

Paul White turns back the hands of time:

Sorts use storage (memory and perhaps disk if they spill) so they do have a facility capable of storing rows between loop iterations. In particular, the sorted output can, in principle, be replayed (rewound).

Still, the short answer to the title question, “Do Sorts Rewind?” is:

Yes, but you won’t see it very often.

Read the whole thing.

Comments closed

Understanding MERGE Execution Plans

Hugo Kornelis walks us through the most interesting operator:

But first a word of warning. The MERGE statement, introduced in SQL Server 2008 as an easier alternative for “delete / update / insert” logic, turned out to have issues when it was released. And now, in 2020, many of those issues still exist. So I’ll just point you to Aaron Bertrand’s excellent overview, and leave you with the recommendation to be extremely wary before using MERGE in production code.

But here, we are not going to use MERGE in production. We are merely going to set up a simple test and look at how the elements in the execution plan cooperate to produce the expected results. This is interesting even if you never use MERGE, because many of the details explained below can also occur in other execution plans.

Read the whole thing, even if you avoid MERGE like the plague.

Comments closed

Optimize for Unknown with Inline Table-Valued Functions

Koen Verbeeck hits on a strange case:

Turns out SQL Server used a plan with a hash join in the fast query, and a nested loop in the slow query. Due to SQL Server also wildly using incorrect estimates, the nested loops performs really poorly. Quite similar to parameter sniffing with stored procedures. Erik Darling has written a great article about it: Inline Table Valued Functions: Parameter Snorting.

The thing is, in contrast to scalar functions or multi-statement table-valued functions, the iTVF should have better performance because it will be expanded into the calling query. This way, SQL Server can use “more correct” estimates and create a plan for each different parameter. Well, today was not that day.

Read on for details on how Koen performed troubleshooting and the solution.

Comments closed

Finding Skew in a Spark DataFrame

Landon Robinson walks us through skew in Spark DataFrames:

Ignoring issues caused by skew can be worth it sometimes, especially if the skew is not too severe, or isn’t worth the time spent for the performance gained. This is particularly true with one-off or ad-hoc analysis that isn’t likely to be repeated, and simply needs to get done.

However, the rest of the time, we need to find out where the skew is occurring, and take steps to dissolve it and get back to processing our big data. This post will show you one way to help find the source of skew in a Spark DataFrame. It won’t delve into the handful of ways to mitigate it (repartitioning, distributing/clustering, isolation, etc) (but our new book will), but this will certainly help pinpoint where the issue may be.

Click through to learn more.

Comments closed

The Merry-Go-Round Scan

David Fowler covers one of the best ways of optimizing frequent scans of large amounts of data:

As we all know, full table scans can be very expensive, poor old SQL is forced to read every single row in a table (of course that doesn’t always mean that it’s a bad choice for SQL).

Lets assume we’ve got a table scan happening that results in 1,000,000 page reads, that’s quite a bit of work for SQL to do. Now imagine another query comes in and needs to scan the same table, that’s also going to need to do 1,000,000 reads to get the data that it needs. If this table happens to be frequently accessed, this is soon going add up.

There’s a clever solution which tends to work better and better as you have more and more queries scanning the table.

Comments closed

Finding Indexes Not in Use

Dennes Torres takes us through a few iterations of a query to find indexes not in use:

It doesn’t matter if you are trying to remove indexes for good reasons or just to work around a bad environment, let’s see in more details how to find which indexes doesn’t have enough usage to justify their existance.

First, some basic definitions, without going into many details:

Index Seek: That’s the best and desirable use of the index. It means the index tree is being used to go directly to the records we need.

Index Scan: Not so good as an index seek, so it could be better. However, sometimes even an index scan is good, a non clustered index scan means the pages of that index are smaller an better for a scan than the pages of the clustered index. There are many variations that makes an index scan good, but most times you don’t need to reach this level of analysis, you may reach your objective only analysing index seeks.

Update: When the fields are updated (update/insert/delete) all indexes which contain those fields need to be updated as well. Indexes are a balance: We increase performance on reading and suffer a bit more when writting. The problem is when the writting happens more than the reading.

Read on to see Dennes’s query evolve and bring important information to the table. For example, it’s not just how often a particular index gets used; it’s also how important the queries are which use this index. An index may only run once a month, but if it turns the most important report the CEO cares about from running in 4 hours to running in 4 seconds, you bet that index is staying.

Comments closed

Eager Spools in SQL Server

Erik Darling describes the concept of eager spooling:

Not only does SQL Server create an index for you, it doesn’t really tell you about it. There’s no loud warning here.

It also throws that index away when the query is done executing. It only exists in tempdb while the query executes, and it’s only available to the query that builds it.

And boy, they sure can take a long time to build.

I enjoy and frequently use Erik’s depiction that an eager spool is SQL Server’s passive-aggressive way of telling you that you need an index.

Comments closed