Press "Enter" to skip to content

Category: Query Tuning

Hash Match and Stream Aggregate Operators in SQL Server

Andy Brownsword rounds up the usual suspects:

In the last post we looked at how TOP and MAX operators compared. We saw the execution plan for a MAX function used a Stream Aggregate operator which is one of two which we can use for aggregation

I wanted to look at the two operators and how they perform the same tasks in different ways. The way they function is key to understanding why the engine may choose to use one over the other and the impact this can have on the performance of a query.

The two operators in question: the Hash Match (Aggregate) and Stream Aggregate

Read on for a discussion of how each operator works and when each makes sense for the optimizer to use.

Leave a Comment

Tips on Using Subqueries in the SELECT Clause

Erik Darling covers a “sometimes” topic:

I think subqueries in select lists are very neat things. You can use them to skip a lot of additional join logic, which can have all sorts of strange repercussions on query optimization, particularly if you use have to use left joins to avoid eliminating results.

Subqueries do have their limitations:

  • They can only return one row
  • They can only return one column

But used in the right way, they can be an excellent method to retrieve results without worrying about what kind of join you’re doing, and how the optimizer might try to rearrange it into the mix.

Read on for a dive into this topic and a scenario in which subqueries in the SELECT clause can be faster than alternatives. My personal preference is, unless there’s a major performance difference, I’d rather have the SELECT clause be as simple as possible. But sometimes, the difference is stark enough to matter.

Leave a Comment

Enforcing Join Order in Postgres

Hans-Juergen Schoenig demands order:

What is the important observation here? Let us take a look at planning time. PostgreSQL needs 0.297 milliseconds to find the best execution plan (= execution strategy) to run the query. The question arising is: Where does the planner need the time to plan the query? The thing is: Even when using explicit joins as shown above PostgreSQL will join those tables implicitly and decide on the best join order. What does that mean in real life?

Read on to see what this means in practice and how you can control join order in Postgres. With SQL Server, there are various join hints that will force a specific join order. As for the why, there are specific circumstances in which you might have more information than the optimizer and can come up with a superior way of joining tables together, especially as queries get more complicated. One of my favorite query tuning books is Dan Tow’s SQL Tuning, which is 20 years old at this point but still lays out a great way of thinking about how to attack the process of running a query. In that book, Dan uses several criteria to determine the table from which you want to drive a particular query, using factors like filters, the existence of foreign key constraints, etc. From there, you have a somewhat-deterministic way of defining the most efficient path for connecting the rest of the tables together. For most queries, especially in OLTP systems, this doesn’t matter very much in practice. But for warehouses, it can make a world of difference.

Leave a Comment

Comparing TOP(1) + ORDER BY vs MAX() Performance in SQL Server

Andy Brownsword breaks out the stopwatch:

The TOP clause limits the number of results which are returned from a query, in this instance we’re focussing on a single result. In contrast, when using MAX we’re applying a function to our data to select the largest value from our data.

Let’s dive into some examples with the StackOverflow data, specifically the Votes table.

Read on for several scenarios and how the two perform. Things get a bit more complicated as you introduce other tables in joins and similar additional factors, but this gives you a good foundation for comparison.

Leave a Comment

Query Compilation Timeouts and Query Store

Kendra Little diagnoses a problem:

Last November, a puzzle was really bothering me. Some queries from an application were timing out frequently after running for 30 seconds, but they were halfway invisible in the SQL Server.

I say “halfway invisible” because I could see the queries while they were running in SQL Server’s dynamic management views using free tools (sp_WhoIsActive and sp_BlitzWho).

But the queries had some odd characteristics:

Through the power of communication with other humans (eew, that idea sounds icky), Kendra was able to learn what the problem was and how you can track such issues outside of Query Store.

Leave a Comment

Search Patterns in T-SQL

Erik Darling puts on the fedora and grabs the bullwhip:

First, what you should not do: A universal search string:

The problem here is somewhat obvious if you’ve been hanging around SQL Server long enough. Double wildcard searches, searching with a string type against numbers and dates, strung-together OR predicates that the optimizer will hate you for.

These aren’t problems that other things will solve either. For example, using CHARINDEX or PATINDEX isn’t a better pattern for double wildcard LIKE searching, and different takes on how you handle parameters being NULL don’t buy you much.

Read on for an example of a terrible search query, a mediocre search query, a good search query, and a possible unicorn: an actually valid reason to use a non-clustered columnstore index.

Comments closed

Indexing for Window Functions

Erik Darling talks window functions:

A lot of the time, the answer to performance issues with ranking windowing functions is simply to get Batch Mode involved. Where that’s not possible, you may have to resort to adding indexes.

Sometimes, even with Batch Mode, there is additional work to be done, but it really does get a lot of the job done.

In this post I’m going to cover some of the complexities of indexing for ranking windowing functions when there are additional considerations for indexing, like join and where clause predicates.

Click through for an in-depth article with plenty of good information.

Comments closed

Checking for the Existence of Multiple Values in SQL

Lukas Eder does a performance test:

But what if you want to check if there are at least 2 (or N) rows? In that case, you cannot use EXISTS, but have to revert to using COUNT(*). However, instead of just counting all matches, why not add a LIMIT clause as well? So, if you want to check if actors called WAHLBERG have played in at least 2 films, instead of this:

Lukas compares performance between two query options in Postgres, Oracle, SQL Server, and MySQL. Because Oracle has weird behavior in the test, Lukas shares a third option that works well for it.

Comments closed

Indexing for Sorted Data

Erik Darling digs into indexes and sorts:

Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.

The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.

Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.

Click through for an interesting problem and solution.

Comments closed