Press "Enter" to skip to content

Understanding Hash Match Aggregates

Itzik Ben-Gan continues his series on grouping and aggregating data by looking at the hash match aggregation process:

The estimated CPU cost for the Hash Aggregate in the plan for Query 8 is 0.166344, and in Query 9 is 0.16903.

It could be an interesting exercise to try and figure out exactly in what way the cardinality of the grouping set, the data types, and aggregate function used affect the cost; I just didn’t pursue this aspect of the costing. So, after making a choice of the grouping set and aggregate function for your query, you can reverse engineer the costing formula. For example, let’s reverse engineer the CPU costing formula for the Hash Aggregate operator when grouping by a single integer column and returning the MAX(orderdate) aggregate. The formula should be:

Operator CPU cost = <startup cost> + @numrows * <cost per row> + @numgroups * <cost per group>

Using the techniques that I demonstrated in the previous articles in the series, I got the following reverse engineered formula:

Operator CPU cost = 0.017749 + @numrows * 0.00000667857 + @numgroups * 0.0000177087

Definitely worth reading in detail.