How The Hash Match Join Operation Works

Dmitry Piliugin explains what happens when SQL Server calls for a hash match to join two tables together:

Hash Match in the join mode consumes two inputs, as we are joining two tables. The main idea is to build the hash table using the first “build” input, and then apply the same approach hash the second “probe” input to see if there will be matches of hashed values.

Query Processor (QP) is doing many efforts while building the plan to choose the correct join order. From the Hash Match prospective, it means that QP should choose what table is on the Build side and what is on the Probe side. The Build size should be smaller as it will be stored in memory when building a hash table.

Building a hash table begins with hashing join key values of the build table and placing them to one or another bucket depending on the hash value. Then QP starts processing the probe side, it applies the same hash function to the probe values, determining the bucket and compares the values inside of the bucket. If there is a match – the row is returned.

That would be the whole story if we had infinite memory, but in the real world, it is not true. More to the point, SQL Server allocates memory to the query before the execution starts and does not change it during the execution. That means that if the allocated memory amount is much less than the data size came during the execution, a Hash Match should be able to partition the joining data, and process it in portions that fit allocated memory, while the rest of the data is spilled to the disk waiting to be processed. Here is where the dancing begins.

Read on to learn more about the details of this operation.

Related Posts

When A Procedure Has Multiple Plan Cache Entries

Arthur Daniels shows that multi-statement stored procedures can have multiple entries in the plan cache: So we have two entries for this stored procedure. I included the statement sql handle to show that each statement handle has its own text. Let’s parse that text to see each statement. I copied the parsing SQL from this Plan […]

Read More

Computer Internals and the Buffer Pool

Randolph West starts a new series on the buffer pool in SQL Server: Now that we’ve reminded ourselves of those fundamentals, let’s take a closer look at the buffer pool. The buffer pool in SQL Server resides in the computer’s main memory (RAM). When the database engine requests a data page for reading or writing, […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30