Press "Enter" to skip to content

Category: Query Tuning

Working with INTERSECT and EXCEPT

Erik Darling wounds me:

I have never once seen anyone use these. The most glaring issue with them is that unlike a lot of other directives in SQL, these ones just don’t do a good job of telling you what they do, and their behavior is sort of weird.

Unlike EXISTS and NOT EXISTS, which state their case very plainly, as do UNION and UNION ALL, figuring these out is not the most straightforward thing. Especially since INTERSECT has operator precedence rules that many other directives do not.

I’ve used EXCEPT to check if two datasets are equivalent for testing purposes: A EXCEPT B should be zero rows, and B EXCEPT A should be zero rows. It has built-in handling of any NULL madness. Set intersections have their uses as well.

Comments closed

An Overview of Postgres’s Planner

Cary Huang digs into one phase of Postgres query processing:

When you send a query to PostgreSQL, it normally would go through stages of query processing and return you the results at the end. These stages are known as:

  • Parse
  • Analyze
  • Rewrite
  • Plan
  • Execute

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog, we will only focus on the “plan” stage or the “planner” module as this is perhaps the most interesting or complex stage if you will. I will share my understanding of the planner module as I investigate its internal workings to handle a simple sequential scan. This will be based on PostgreSQL 16.

Read on to learn what the planner does and how it works, at a high level.

Comments closed

Sorting by Large Columns in SQL Server

Aaron Bertrand has a clever trick:

In the most basic cases, we can implement this functionality in SQL Server by using OFFSET/FETCH. The problem is that anything that uses TOP or OFFSET will potentially have to scan everything in the index up until the page requested, which means that queries become slower and slower for higher page numbers. To achieve anything close to linear performance, you need to have a narrow, covering index for each sort option, or use columnstore as Erik Darling recommends here, or concede that some searches are just going to be slow. Throw in additional filtering, pulling data from other tables, and letting users dictate any sort order they want, and it becomes that much harder to tune for all cases.

I have a lot that I want to say about paging, and I will follow up with more content soon. Several years ago, I wrote about some ways to reduce the pain here, and it is long overdue for a refresh. For today’s post, though, I wanted to talk specifically about pagination when you have to order by large values. By “large” I mean any data type that can’t fit in an index key, like nvarchar(4000) or, really, anything that can’t lead in an index and/or would push the key past 1,700 bytes.

Read on for the scenario and how it all works.

Comments closed

The Proper Use of Views and Inline UDFs

Erik Darling plays tic-tac-toe:

The problem is really the stuff that people stick into views. They’re sort of like a junk drawer for data. Someone builds a view that returns a correct set of results, which becomes a source of truth. Then someone else comes along and uses that view in another view, because they know it returns the correct results, and so on and so on. Worse, views tend to do a bunch of data massaging, left joining and coalescing and substringing and replacing and case expressioning and converting things to other things. The bottom line is that views are as bad as you make them.

The end result is a trash monster with a query plan that can only be viewed in full from deep space.

Read on to learn the use cases for views and inline UDFs, as well as a few important notes regarding performance of each. Views are like mogwai: they’re fine as long as you never get them wet and never let them eat after midnight. The problem is, far too many companies are apparently the business equivalent of all-you-can-eat buffets at water parks.

Inline user-defined functions are like patenting a device that lets you shoot yourself in both feet with one pull of the trigger. Which, if I understand things correctly, means you’ll need a Form 4 for each inline UDF.

Comments closed

Hash Match and Stream Aggregate Operators in SQL Server

Andy Brownsword rounds up the usual suspects:

In the last post we looked at how TOP and MAX operators compared. We saw the execution plan for a MAX function used a Stream Aggregate operator which is one of two which we can use for aggregation

I wanted to look at the two operators and how they perform the same tasks in different ways. The way they function is key to understanding why the engine may choose to use one over the other and the impact this can have on the performance of a query.

The two operators in question: the Hash Match (Aggregate) and Stream Aggregate

Read on for a discussion of how each operator works and when each makes sense for the optimizer to use.

Comments closed

Tips on Using Subqueries in the SELECT Clause

Erik Darling covers a “sometimes” topic:

I think subqueries in select lists are very neat things. You can use them to skip a lot of additional join logic, which can have all sorts of strange repercussions on query optimization, particularly if you use have to use left joins to avoid eliminating results.

Subqueries do have their limitations:

  • They can only return one row
  • They can only return one column

But used in the right way, they can be an excellent method to retrieve results without worrying about what kind of join you’re doing, and how the optimizer might try to rearrange it into the mix.

Read on for a dive into this topic and a scenario in which subqueries in the SELECT clause can be faster than alternatives. My personal preference is, unless there’s a major performance difference, I’d rather have the SELECT clause be as simple as possible. But sometimes, the difference is stark enough to matter.

Comments closed

Enforcing Join Order in Postgres

Hans-Juergen Schoenig demands order:

What is the important observation here? Let us take a look at planning time. PostgreSQL needs 0.297 milliseconds to find the best execution plan (= execution strategy) to run the query. The question arising is: Where does the planner need the time to plan the query? The thing is: Even when using explicit joins as shown above PostgreSQL will join those tables implicitly and decide on the best join order. What does that mean in real life?

Read on to see what this means in practice and how you can control join order in Postgres. With SQL Server, there are various join hints that will force a specific join order. As for the why, there are specific circumstances in which you might have more information than the optimizer and can come up with a superior way of joining tables together, especially as queries get more complicated. One of my favorite query tuning books is Dan Tow’s SQL Tuning, which is 20 years old at this point but still lays out a great way of thinking about how to attack the process of running a query. In that book, Dan uses several criteria to determine the table from which you want to drive a particular query, using factors like filters, the existence of foreign key constraints, etc. From there, you have a somewhat-deterministic way of defining the most efficient path for connecting the rest of the tables together. For most queries, especially in OLTP systems, this doesn’t matter very much in practice. But for warehouses, it can make a world of difference.

Comments closed

Comparing TOP(1) + ORDER BY vs MAX() Performance in SQL Server

Andy Brownsword breaks out the stopwatch:

The TOP clause limits the number of results which are returned from a query, in this instance we’re focussing on a single result. In contrast, when using MAX we’re applying a function to our data to select the largest value from our data.

Let’s dive into some examples with the StackOverflow data, specifically the Votes table.

Read on for several scenarios and how the two perform. Things get a bit more complicated as you introduce other tables in joins and similar additional factors, but this gives you a good foundation for comparison.

Comments closed

Query Compilation Timeouts and Query Store

Kendra Little diagnoses a problem:

Last November, a puzzle was really bothering me. Some queries from an application were timing out frequently after running for 30 seconds, but they were halfway invisible in the SQL Server.

I say “halfway invisible” because I could see the queries while they were running in SQL Server’s dynamic management views using free tools (sp_WhoIsActive and sp_BlitzWho).

But the queries had some odd characteristics:

Through the power of communication with other humans (eew, that idea sounds icky), Kendra was able to learn what the problem was and how you can track such issues outside of Query Store.

Comments closed