Press "Enter" to skip to content

Eager Aggregation in SQL Queries

Boris Novikov talks about an uncommon topic:

In this article we discuss one type of query transformation that most optimizers do not use. Because of this, it can be beneficial for you to rewrite a query to help the optimizer order operations in a way that can be beneficial.

An analytical query is supposed to produce some kind of summary generalizing properties of huge amounts of data but at the same time should be compact and easy for humans to understand. In terms of the SQL query language this means that any analytical query extracts and combines large number of rows and then uses aggregate functions with or even without GROUP BY clause. More specifically, we consider queries that contain many JOIN operations followed by aggregation. Usually, queries are written in this way and, surprisingly, the optimizers choose the best order of joins but leave the aggregation as the last step.

Read on for more information, including a minor lamentation that the various relational database optimizers tend not to perform this kind of operation. In SQL Server, I have an example of this pre-aggregation using the APPLY operator (with demo code here) and a simple but realistic example of how drastic the savings can be.