How The Hash Match Join Operation Works

Dmitry Piliugin explains what happens when SQL Server calls for a hash match to join two tables together:

Hash Match in the join mode consumes two inputs, as we are joining two tables. The main idea is to build the hash table using the first “build” input, and then apply the same approach hash the second “probe” input to see if there will be matches of hashed values.

Query Processor (QP) is doing many efforts while building the plan to choose the correct join order. From the Hash Match prospective, it means that QP should choose what table is on the Build side and what is on the Probe side. The Build size should be smaller as it will be stored in memory when building a hash table.

Building a hash table begins with hashing join key values of the build table and placing them to one or another bucket depending on the hash value. Then QP starts processing the probe side, it applies the same hash function to the probe values, determining the bucket and compares the values inside of the bucket. If there is a match – the row is returned.

That would be the whole story if we had infinite memory, but in the real world, it is not true. More to the point, SQL Server allocates memory to the query before the execution starts and does not change it during the execution. That means that if the allocated memory amount is much less than the data size came during the execution, a Hash Match should be able to partition the joining data, and process it in portions that fit allocated memory, while the rest of the data is spilled to the disk waiting to be processed. Here is where the dancing begins.

Read on to learn more about the details of this operation.

Related Posts

Understanding Hash Match Aggregates

Itzik Ben-Gan continues his series on grouping and aggregating data by looking at the hash match aggregation process: The estimated CPU cost for the Hash Aggregate in the plan for Query 8 is 0.166344, and in Query 9 is 0.16903. It could be an interesting exercise to try and figure out exactly in what way […]

Read More

Row Width And Snapshot Isolation

Kendra Little shows us the impact that row width has on snapshot isolation: So I went to work to demonstrate row width impact on the version store — when only a tiny bit column is changed in the row. Here’s how I did the test: I created two tables, dbo.Narrow and dbo.Wide. They each each have a […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30