Understanding Hash Match Aggregates

Itzik Ben-Gan continues his series on grouping and aggregating data by looking at the hash match aggregation process:

The estimated CPU cost for the Hash Aggregate in the plan for Query 8 is 0.166344, and in Query 9 is 0.16903.

It could be an interesting exercise to try and figure out exactly in what way the cardinality of the grouping set, the data types, and aggregate function used affect the cost; I just didn’t pursue this aspect of the costing. So, after making a choice of the grouping set and aggregate function for your query, you can reverse engineer the costing formula. For example, let’s reverse engineer the CPU costing formula for the Hash Aggregate operator when grouping by a single integer column and returning the MAX(orderdate) aggregate. The formula should be:

Operator CPU cost = <startup cost> + @numrows * <cost per row> + @numgroups * <cost per group>

Using the techniques that I demonstrated in the previous articles in the series, I got the following reverse engineered formula:

Operator CPU cost = 0.017749 + @numrows * 0.00000667857 + @numgroups * 0.0000177087

Definitely worth reading in detail.

Related Posts

Understanding Analysis Services Memory Behavior

Shabnam Watson walks us through SQL Server Analysis Services memory settings and application behavior under memory stress: If memory consumption is below the Low limit everything is fine and it is free to stay in memory. Once the consumption passes the Low limit a cleaner thread wakes up and tries to clean up memory. At this point […]

Read More

How SQL Server Implements Index Spooling

Hugo Kornelis has a long article which dives into the way SQL Server handles index spooling: A Table Spool operator stores its data in a worktable that is structured as a clustered index. The index is not built on any of the columns in the data, nor on any artificially added columns. It is structured […]

Read More


June 2018
« May Jul »