Press "Enter" to skip to content

Category: Query Tuning

Eager Aggregation in SQL Queries

Boris Novikov talks about an uncommon topic:

In this article we discuss one type of query transformation that most optimizers do not use. Because of this, it can be beneficial for you to rewrite a query to help the optimizer order operations in a way that can be beneficial.

An analytical query is supposed to produce some kind of summary generalizing properties of huge amounts of data but at the same time should be compact and easy for humans to understand. In terms of the SQL query language this means that any analytical query extracts and combines large number of rows and then uses aggregate functions with or even without GROUP BY clause. More specifically, we consider queries that contain many JOIN operations followed by aggregation. Usually, queries are written in this way and, surprisingly, the optimizers choose the best order of joins but leave the aggregation as the last step.

Read on for more information, including a minor lamentation that the various relational database optimizers tend not to perform this kind of operation. In SQL Server, I have an example of this pre-aggregation using the APPLY operator (with demo code here) and a simple but realistic example of how drastic the savings can be.

Comments closed

Local Variables in Stored Procedures

Erik Darling does not approve:

Like many other things we’ve discussed thus far, local variables are a convenience to you that have behavior many people are still shocked by.

You, my dear and constant reader, may not be shocked, but the nice people who pay me money to fix things seem quite astounded by what happens when you invoke local variables.

So I find myself in a difficult position: do I dredge up more red meat for the millions of die-hard SQL Server performance nuts who come here for the strange and outlandish, or produce evergreen content for people who pay my substantial bar tabs.

You have at least a 50% chance to guess what Erik does next.

Comments closed

Window Function Execution Plans with RANGE

Hugo Kornelis continues a series on explaining the execution plans for window functions:

This is part twenty-six of the plansplaining series. And already the fourth episode about window functions. The first of those posts covered basic window functions; the second post focused on fast-track optimization for running aggregates, and the third post explained how the optimizer works around the lack of execution plan support for UNBOUNDED FOLLOWING.

But all of those were about OVER specifications that use the ROWS keyword. Let’s now look at the alternative, the RANGE keyword.

Click through to see how the various options work with RANGE. By the way, I still want range intervals, like how Postgres implements them, where you can define an interval of X days/hours/minutes/whatever rather than a specific number of rows. Maybe one of these versions…

Comments closed

Anti-Joins and Folding in Power Query

Chris Webb has a workaround:

Power Query allows you to merge (“join” in database terms) two tables together in a variety of different ways, including left and right anti joins. Unfortunately, as I found recently, anti joins don’t fold on SQL Server-related data sources, which can result in performance problems. Luckily there is a different way of doing anti joins that does fold.

An anti-join, by the way, is the type of thing you use when performing a NOT EXISTS operation: what is in driver table A that is not in lookup table B given some condition set?

Comments closed

Optimizing Sort Operators in Window Functions

Andy Brownsword talks about window function query tuning:

We’re on quite a roll with window functions these past few weeks. Last week we looked at the operators we’d see in execution plans when using a window function. This week I wanted to tackle one of the more troublesome ones specifically: the Sort operator.

We know that sort operators are expensive in our queries. To use a window function our data needs to be sorted. How about if we need multiple functions? What if we’d like the output sorted too? Can we optimise any of those out of the execution plan?

Read on for several tips.

Comments closed

Set-Based vs Row-Based Code Considerations

Kevin Hill explains a concept:

In SQL Server, the terms “set-based” and “row-based” refer to different approaches or styles of writing SQL code to manipulate data. These styles have implications for performance, readability, and the way queries are processed. Let’s explore the differences between set-based and row-based code:

Click through for Kevin’s thoughts. One thing I’d re-emphasize (because Kevin did make this point), especially for people coming to SQL Server from Oracle, is that set-based operations are going to be more efficient about 95-99% of the time than their row-based equivalents. Oracle has a large number of optimizations to make cursor-style code efficient and T-SQL has very few of those, as set-based is the more natural expression of SQL.

One quick example of this is, prior to SQL Server 2012 and its extended support of window functions, the fastest officially supported way to calculate a running total was to build a cursor. The other alternatives, including self-joins, were much less efficient. There was an unsupported but much faster technique that relied on a peculiarity of how SQL Server sorts clustered indexes (the “quirky update” method), but because it relied on internals that could change with any patch, it was a risky maneuver.

Comments closed

Reviewing an Execution Plan for a Window Function

Andy Brownsword categorizes the components:

Using last week’s sample data we can run the query below to demonstrate operators typically used for a window function:

The result of this query is a set of data with a running total of the Sales Value within each Financial Quarter.

We’ll follow the data through some of the operators in this execution plan to understand their part in the function. As with regular execution plans we’ll be working from right to left.

Read on for the key operators.

Comments closed

Self-Join Optimizations and Index Intersection

Daniel Hutmacher shows off a possibility:

This blog post started as a “what if” contemplation in my head: Suppose you have a reasonably large table with a clustered index and a number of non-clustered indexes. If your WHERE clause filters by multiple columns covered by those non-clustered indexes, could it potentially be faster to rewrite that WHERE clause to use those non-clustered indexes?

The answer might surprise you.

To be honest, when I learned about the concept originally, I expected that there would be a great deal of use cases for it. But SQL Server rarely comes up with this answer on its own and I think that’s because in most scenarios, we’d need to do additional sorting or other expensive operations to get the multiple indexes aligned just right to make this the faster option.

Comments closed

Fast-Track Optimization and Window Functions

Hugo Kornelis digs into a performance improvement the SQL Server optimizer has for window functions:

This is part twenty-four of the plansplaining series. In the previous part, I explained the execution plans for basic window functions, with and without a window frame. Especially the latter group performed quite poorly in the examples. So let’s now look at an optimization that SQL Server can apply to most cases, that prevents this rather bad scaling.

Click through to see what this is, how it works, and when the optimizer is able to use it.

Comments closed

Strategies for Filtering on Bit Columns

Aaron Bertrand answers a question:

Recently someone posted a question where they couldn’t quite figure out how to construct a predicate based on a bit parameter. They tried to write a procedure like this, which wouldn’t parse, of course:

At first, I thought Aaron meant querying integer bitmasks in T-SQL, in which case, the best strategy is “don’t.” But this is a different and much more useful scenario.

Comments closed