Press "Enter" to skip to content

Category: Query Tuning

Getting the Last Query Plan Stats in SQL Server 2019

John Morehouse walks us through retrieving the actual query plan stats of the last run of an execution plan:

Currently, if you are not on SQL Server 2019 and wanted to see an execution plan, you would attempt to dive into the execution plan cache to retrieve an estimated plan.  Keep in mind that this just an estimated plan and the actual plan, while the shape should be the same, will have runtime metrics.  These  Actual runtime metrics could be, different than what is shown in the estimated plan, so it is important to get the actual whenever possible.

With the introduction of lightweight statistics, SQL Server can retain the metrics of the actual execution plan if enabled.  Note that this could introduce a slight increase in overhead, however, I haven’t yet seen it be determinantal. These metrics are vastly important when performing query tuning.

Read on to see the specific set of metrics you can pull and how to do it. This does require SQL Server 2019.

Comments closed

When Batch Mode on Rowstore Hurts Performance

Erik Darling walks us through a scenario where batch mode on rowstore can make performance of a query worse:

I’m not mad at 2019 or Batch Mode On Rowstore (BMOR) or anything.

But if I’m gonna get into it, I’m gonna document issues I run into so that hopefully they help you out, too.

One thing I ran into recently was where BMOR kicked in for a query and made it slow down.

Click through for the scenario, why it’s slower when using batch mode, and two ways you can improve the query.

Comments closed

Derived Table Nesting and Performance

Itzik Ben-Gan digs into some of the performance considerations around nested derived tables:

Unnesting/substitution of table expressions is a process of taking a query that involves nesting of table expressions, and as if substituting it with a query where the nested logic is eliminated. I should stress that in practice, there’s no actual process in which SQL Server converts the original query string with the nested logic to a new query string without the nesting. What actually happens is that the query parsing process produces an initial tree of logical operators closely reflecting the original query. Then, SQL Server applies transformations to this query tree, eliminating some of the unnecessary steps, collapsing multiple steps into fewer steps, and moving operators around. In its transformations, as long as certain conditions are met, SQL Server can shift things around across what were originally table expression boundaries—sometimes effectively as if eliminating the nested units. All of this in attempt to find an optimal plan.

In this article I cover both cases where such unnesting takes place, as well as unnesting inhibitors. That is, when you use certain query elements it prevent SQL Server from being able to move logical operators in the query tree, forcing it to process the operators based on the boundaries of the table expressions used in the original query.

That’s on my list for a second reading.

Comments closed

Local Variables with TOP and ORDER BY

Erik Darling points out issues with using local variables. First up is with TOP:

In case you missed it for some reason, check out this post of mine about local variables. Though it’s hard to imagine how you missed it, since it’s the single most important blog post ever written, even outside of SQL Server. It might even be more important than SQL Server. Time will tell.

While live streaming recently about paging queries, I thought that it might make an interesting post to see what happens when you use variables in places other than the where clause.

After several seconds of thinking about it, I decided that TOP would be a good enough place to muck around.

After that is ORDER BY:

I see this kind of pattern a lot in paging queries where people are doing everything in their power to avoid writing dynamic SQL for some reason.

It’s almost as if an entire internet work of SQL Server knowledge and advice doesn’t exist when they’re writing these queries.

Quite something. Quite something indeed.

I’d call out Erik’s ORDER BY examples by saying “C’mon, nobody does that!” if I hadn’t actually seen people do that…

Comments closed

What MAXDOP Controls

Pedro Lopes gives us an explanation of what MAXDOP really does for us:

There are plenty of blogs on these topics, and the official documentation does a good job of explaining these (in my opinion). If you want to know more about the guidelines and ways to override for specific queries, refer to the Recommendations section in the Configure the max degree of parallelism Server Configuration Option documentation page.

But what does MAXDOP control? A common understanding is that it controls the number of CPUs that can be used by a query – previous revisions of the documentation used this abstraction. And while that is a correct abstraction, it’s not exactly accurate.

This is definitely a nice companion piece to Paul White’s article on how MAXDOP works.

Comments closed

How MAXDOP Works

Paul White lets us in on a secret:

The nuance to the question is in the phrase “running concurrently”. There is a limit to the number of characters one can use in a Twitter poll. My intention was for people to focus on the number of threads that could have a status of RUNNING at any one moment.

There are various definitions for that status in different DMVs, but essentially RUNNING means actively executing on a processor — not waiting for its turn on a CPU or for some other resource to become available.

My expectation was that most people would think that MAXDOP 4 would limit a parallel query to using 4 CPU cores at any one moment in time. Indeed, that was the second most popular answer to the poll.

But read on to understand why 4 isn’t the correct answer.

1 Comment

Key Lookups and Self-Joins

Erik Darling has an interesting method for eliminating key lookups:

This post isn’t going to go terribly deep into anything, but I do want to make a few things about them more clear, because I don’t usually see them mentioned anywhere.

1. Lookups are joins between two indexes on the same table
2. Lookups can only be done via nested loops joins
3. Lookups can’t be moved around in the execution plan

I don’t want you to think that every lookup is bad and needs to be fixed, but I do want you to understand some of the limitations around optimizing them.

Definitely worth the read.

Comments closed

Delayed Prefetch and Hidden Reads

Hugo Kornelis looks at when worlds collide:

So let’s check. The picture above shows, side by side, the properties of the Index Seek and the Key Lookup operator. They show that the Index Seek did 3 logical reads only, while Key Lookup did 650 logical reads. A clear indication where the majority of the work is done.

But wait. Aren’t we missing something?

The SET STATISTICS IO ON output indicates a total of 722 logical reads. The two screenshots above add up to 653 logical reads. Where are the other 69 logical reads?

Read on for the answer.

Comments closed

Optimizing Read Performance of Heaps

Uwe Ricken continues a series on heaps in SQL Server:

Heaps are not necessarily the developer’s favourite child, as they are not very performant, especially when it comes to selecting data (most people think so!). Certainly, there is something true about this opinion, but in the end, it is always the workload that decides it. In this article, I describe how a Heap works when data are selected. If you understand the process in SQL Server when reading data from a Heap, you can easily decide if a Heap is the best solution for your workload.

Uwe hits on a couple of the (few) use cases where heap performance can match and sometimes surpass clustered index performance.

Comments closed

Calculating Partitions for Processing Data Files in Apache Spark

Ajay Gupta digs into how to calculate the number of partitions the different Spark APIs use when reading from files:

Until recently, the process of picking up a certain number of partitions against a set of data files, always looked mysterious to me. However, recently, during an optimization routine, I wanted to change the default number of partitions picked by Spark for processing a set of data files, and that is when I started to decode this process comprehensively along with proofs. Hopefully, the description of this decoded process would also help the readers to understand Spark a bit deeper and would enable them to design an efficient and optimized Spark routine.

This is important information if you’re tuning Spark cluster performance.

Comments closed