A Thought on Query Granularity

Chris Johnson shares some thoughts:

This query is pretty simple, we’re wanting to return a set of data about the orders placed including the total cost of each order. However the orders in the database are split into an Orders table that holds the order level information, and an OrderLines table that holds information on each product ordered in each order, and we need that OrderLines table to get the cost.

So we write our query, and we join to the OrderLines table, and we sum the cost. But then, because we have an aggregate, we need to group by every other column we’re returning. And to me that just doesn’t look right, and it doesn’t convey the intention of the query properly.

In Chris’s simple example, I’m not sure I’d push it very much, but Chris does have a good point in terms of explaining query intent. Also, depending on how many order lines there are relative to orders (the next step in the chain for that query), aggregation in a common table expression could be faster than waiting until after the join to aggregate on all of the columns. In reality, that’s the most likely reason I’d make this change, assuming that it made a big enough performance difference. But if you take a much more complicated query of this sort, then I’d be more amenable to the argument.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30