Nested Loops, Hash, Or Merge: Which Is Best?

Grant Fritchey dodges the important questions:

First response, also a joke, was the question at the title of this post:

What is the preferred operator when joining tables: Hash Match, Nested Loops or Merge?

While my immediate response to this question is, yes. Meaning, they’re all preferred, situationally. I decided to expand on that a bit.

I completely agree with Grant:  there is no single best operator.  If there were, database companies wouldn’t have multiple options.  That said, in an ideal world, all joins would be merge joins; in our fallen world, nested loops and hash matches often prove superior second-best alternatives.

Parallelism Strategies For Grouping Operations

Itzik Ben-Gan continues his series on grouping data in SQL Server by looking at how these operations can go parallel:

Besides needing to choose between various grouping and aggregation strategies (preordered Stream Aggregate, Sort + Stream Aggregate, Hash Aggregate), SQL Server also needs to choose whether to go with a serial or a parallel plan. In fact, it can choose between multiple different parallelism strategies. SQL Server uses costing logic that results in optimization thresholds that under different conditions make one strategy preferred to the others. We’ve already discussed in depth the costing logic that SQL Server uses in serial plans in the previous parts of the series. In this section I’ll introduce a number of parallelism strategies that SQL Server can use for handling grouping and aggregation. Initially, I won’t get into the details of the costing logic, rather just describe the available options. Later in the article I’ll explain how the costing formulas work, and an important factor in those formulas called DOP for costing.

As you will later learn, SQL Server takes into account the number of logical CPUs in the machine in its costing formulas for parallel plans. In my examples, unless I say otherwise, I assume the target system has 8 logical CPUs. If you want to try out the examples that I’ll provide, in order to get the same plans and costing values like I do, you need to run the code on a machine with 8 logical CPUs as well. If you’re machine happens to have a different number of CPUs, you can emulate a machine with 8 CPUs—for costing purposes—like so:


Even though this tool is not officially documented and supported, it’s quite convenient for research and learning purposes.

This is a fairly long article, but a great one.

Finding Low-Hanging Fruit When Tuning SQL Server

Kevin Hill has a couple scripts which help him find easy performance gains:

Recently, I’ve been getting a lot of performance tuning work, much of which is basically “things are slow…can you fix them?” type of requests.  Most experienced DBAs know that there a few gazillion factors that can lead to this request, and I won’t re-hash them here.

Lets assume, that we’ve optimzed or eliminated the server, disks, network, etc. and were now looking at SQL code and everyone’s favorite – indexes.

I have two scripts I use that give me a quick overview of the type of work SQL Server is doing.   These complement each other, and are used AS A STARTING POINT to locate the low-hanging fruit that can be causing excessive work for the disks and memory of the server.

Click through for those scripts.

The Power Of Predicate Pushdown

Pedro Lopes explains how predicate pushdown helps improve performance on queries:

First, let’s define a few terms, so we can see how to detect whether we’re making good use of our indexes, as they relate to the queries running in our SQL Server.

  1. Whenever you submit a query to SQL Server, if it includes a JOIN and/or WHERE clause, that constitutes a row filtering pattern known as a predicate.
  2. The query optimizer can use that to estimate how to best retrieve only the intended rows, after that predicate has been applied, this surfaces in the query plan as the Estimated Number of Rows.
  3. When that estimated plan is executed, and you look at the actual execution plan, this surfaces as the Actual Number of Rows. Usually, a big difference between Estimated and Actual number of rows indicates a misestimation that may need to be addressed to improve performance: maybe you don’t have the right indexes in place?

These are the two properties related to rows you had on every SQL Server plan up to SQL Server 2014.

Read on to learn how predicate pushdown can make queries faster.

Finding Scalar Functions In Execution Plans

Kendra Little points out that scalar user-defined functions can hide in the most unassuming of places:

After we find matches based on the customer id, we have more work “left over” — that’s the “residual” bit.

For every row that matches, SQL Server is plugging values into the Website.CalculateCustomerPrice() function and comparing the result to the Unit price column, just like we asked for in the where clause.

In other words, this is happening for every row in Sales.InvoiceLines that has a matching row in Sales.Invoices.

Which is every single invoice & invoice line, as it turns out.

It’s a shame there’s no “this is why your query is slow” plan operator for scalar UDFs.

Alleviating tempdb Contention

Pam Lahoud has some advice for those with tempdb-heavy workloads:

TL;DR – Update to the latest CU, create multiple tempdb files, if you’re on SQL 2014 or earlier enable TF 1117 and 1118, if you’re on SQL 2016 enable TF 3427.

And now it’s time for everyone’s favorite SQL Server topic – tempdb! In this article, I’d like to cover some recent changes that you may not be aware of that can help alleviate some common performance issues for systems that have a very heavy tempdb workload. We’re going to cover three different scenarios here:

  1. Object allocation contention

  2. Metadata contention

  3. Auditing overhead (even if you don’t use auditing)

There’s some good information in here so don’t just say tl;dr.

Performance Tuning Window Functions

Kathi Kellenberger gives us some hints on tuning queries using window functions:

The only way to overcome the performance impact of sorting is to create an index specifically for the OVER clause. In his book Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions, Itzik Ben-Gan recommends the POC index. POC stands for (P)ARTITION BY, (O)RDER BY, and (c)overing. He recommends adding any columns used for filtering before the PARTITION BY and ORDER BY columns in the key. Then add any additional columns needed to create a covering index as included columns. Just like anything else, you will need to test to see how such an index impacts your query and overall workload. Of course, you cannot add an index for every query that you write, but if the performance of a particular query that uses a window function is important, you can try out this advice.

There are some good insights here.

Studying Performance Problems With Web Applications

Adrian Colyer reviews a paper on fixing performance problems with ORMs:

This is a fascinating study of the problems people get into when using ORMs to handle persistence concerns in their web applications. The authors study real-world applications and distil a catalogue of common performance anti-patterns. There are a bunch of familiar things in the list, and a few that surprised me with the amount of difference they can make. By fixing many of the issues that they find, Yang et al., are able to quantify how many lines of code it takes to address the issue, and what performance improvement the fix delivers.

Much of this is straightforward, but worth the read.


Lukas Eder compares the OFFSET FETCH logic versus using ROWNUM for grabbing an ordered sub-selection of rows in Oracle:

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

| Id | Operation | Name | Rows |
|* 1 | COUNT STOPKEY | | |
| 2 | VIEW | | 1 |
Predicate Information (identified by operation id):
--------------------------------------------------- 1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

| Id | Operation | Name | Rows |
|* 1 | VIEW | | 1 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000 |
| 3 | TABLE ACCESS FULL | FILM | 1000 |
Predicate Information (identified by operation id):
--------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Lukas digs into this and is not the biggest fan of OFFSET-FETCH.  On the SQL Server side, my anecdotal experience has been that it doesn’t perform nearly as well as you’d like either.

Join Elimination

Bert Wagner shows off the concept of join elimination in SQL Server:

SQL Server avoids joining to the Sales.Invoices table because it trusts the referential integrity maintained by the foreign key constraint defined on InvoiceID between Sales.InvoiceLines and Sales.Invoices; if a row exists in Sales.InvoiceLines, a row with the matching value for InvoiceID must exist in Sales.Invoices. And since we are only returning data from the Sales.InvoiceLines table, SQL Server doesn’t need to read any pages from Sales.Invoices at all.

We can verify that SQL Server is using the foreign key constraint to eliminate the join by dropping the constraint and running our query again:

ALTER TABLE [Sales].[InvoiceLines] DROP CONSTRAINT [FK_Sales_InvoiceLines_InvoiceID_Sales_Invoices];

Erik Darling shows that the optimizer isn’t perfect at this:

Rob Farley has my favorite material on it. There’s an incredible amount of laziness ingenuity built into the optimizer to keep your servers from doing unnecessary work.

That’s why I’d expect a query like this to throw away the join:

After all, we’re joining the Users table to itself on the PK/CX. This doesn’t stand a chance at eliminating rows, producing duplicate rows, or producing NULL values. We’re only getting a count of the PK/CX, which isn’t NULLable anyway and…

So don’t do that.


July 2018
« Jun