How The Hash Match Join Operation Works

Dmitry Piliugin explains what happens when SQL Server calls for a hash match to join two tables together:

Hash Match in the join mode consumes two inputs, as we are joining two tables. The main idea is to build the hash table using the first “build” input, and then apply the same approach hash the second “probe” input to see if there will be matches of hashed values.

Query Processor (QP) is doing many efforts while building the plan to choose the correct join order. From the Hash Match prospective, it means that QP should choose what table is on the Build side and what is on the Probe side. The Build size should be smaller as it will be stored in memory when building a hash table.

Building a hash table begins with hashing join key values of the build table and placing them to one or another bucket depending on the hash value. Then QP starts processing the probe side, it applies the same hash function to the probe values, determining the bucket and compares the values inside of the bucket. If there is a match – the row is returned.

That would be the whole story if we had infinite memory, but in the real world, it is not true. More to the point, SQL Server allocates memory to the query before the execution starts and does not change it during the execution. That means that if the allocated memory amount is much less than the data size came during the execution, a Hash Match should be able to partition the joining data, and process it in portions that fit allocated memory, while the rest of the data is spilled to the disk waiting to be processed. Here is where the dancing begins.

Read on to learn more about the details of this operation.

Related Posts

Diving Into Columnstore Index Scans

Hugo Kornelis continues a series of posts on index scans: The Columnstore Index Scan is not really an actual operator. You can encounter it in graphical execution plans in SSMS (and other tools), but if you look at the underlying XML of the execution plan, you will see that it is either an Index Scan or a Clustered Index Scan operator. […]

Read More

Physical Operators: Apply and Nested Loops

Paul Whtie takes us through the Apply operator versus a classic nested loop join operator: The optimizer’s output may contain both apply and nested loops join physical operations. Both are shown in execution plans as a Nested Loops Join operator, but they have different properties: ApplyThe Nested Loops Join operator has Outer References. These describe parameter values passed from the outer (upper) side of the join […]

Read More


April 2018
« Mar May »