How The Hash Match Join Operation Works

Dmitry Piliugin explains what happens when SQL Server calls for a hash match to join two tables together:

Hash Match in the join mode consumes two inputs, as we are joining two tables. The main idea is to build the hash table using the first “build” input, and then apply the same approach hash the second “probe” input to see if there will be matches of hashed values.

Query Processor (QP) is doing many efforts while building the plan to choose the correct join order. From the Hash Match prospective, it means that QP should choose what table is on the Build side and what is on the Probe side. The Build size should be smaller as it will be stored in memory when building a hash table.

Building a hash table begins with hashing join key values of the build table and placing them to one or another bucket depending on the hash value. Then QP starts processing the probe side, it applies the same hash function to the probe values, determining the bucket and compares the values inside of the bucket. If there is a match – the row is returned.

That would be the whole story if we had infinite memory, but in the real world, it is not true. More to the point, SQL Server allocates memory to the query before the execution starts and does not change it during the execution. That means that if the allocated memory amount is much less than the data size came during the execution, a Hash Match should be able to partition the joining data, and process it in portions that fit allocated memory, while the rest of the data is spilled to the disk waiting to be processed. Here is where the dancing begins.

Read on to learn more about the details of this operation.

Related Posts

How SQL Server Implements Index Spooling

Hugo Kornelis has a long article which dives into the way SQL Server handles index spooling: A Table Spool operator stores its data in a worktable that is structured as a clustered index. The index is not built on any of the columns in the data, nor on any artificially added columns. It is structured […]

Read More

Understanding Hash Match Aggregates

Itzik Ben-Gan continues his series on grouping and aggregating data by looking at the hash match aggregation process: The estimated CPU cost for the Hash Aggregate in the plan for Query 8 is 0.166344, and in Query 9 is 0.16903. It could be an interesting exercise to try and figure out exactly in what way […]

Read More


April 2018
« Mar May »