Press "Enter" to skip to content

Category: Query Tuning

Comparing TOP(1) + ORDER BY vs MAX() Performance in SQL Server

Andy Brownsword breaks out the stopwatch:

The TOP clause limits the number of results which are returned from a query, in this instance we’re focussing on a single result. In contrast, when using MAX we’re applying a function to our data to select the largest value from our data.

Let’s dive into some examples with the StackOverflow data, specifically the Votes table.

Read on for several scenarios and how the two perform. Things get a bit more complicated as you introduce other tables in joins and similar additional factors, but this gives you a good foundation for comparison.

Comments closed

Query Compilation Timeouts and Query Store

Kendra Little diagnoses a problem:

Last November, a puzzle was really bothering me. Some queries from an application were timing out frequently after running for 30 seconds, but they were halfway invisible in the SQL Server.

I say “halfway invisible” because I could see the queries while they were running in SQL Server’s dynamic management views using free tools (sp_WhoIsActive and sp_BlitzWho).

But the queries had some odd characteristics:

Through the power of communication with other humans (eew, that idea sounds icky), Kendra was able to learn what the problem was and how you can track such issues outside of Query Store.

Comments closed

Search Patterns in T-SQL

Erik Darling puts on the fedora and grabs the bullwhip:

First, what you should not do: A universal search string:

The problem here is somewhat obvious if you’ve been hanging around SQL Server long enough. Double wildcard searches, searching with a string type against numbers and dates, strung-together OR predicates that the optimizer will hate you for.

These aren’t problems that other things will solve either. For example, using CHARINDEX or PATINDEX isn’t a better pattern for double wildcard LIKE searching, and different takes on how you handle parameters being NULL don’t buy you much.

Read on for an example of a terrible search query, a mediocre search query, a good search query, and a possible unicorn: an actually valid reason to use a non-clustered columnstore index.

Comments closed

Indexing for Window Functions

Erik Darling talks window functions:

A lot of the time, the answer to performance issues with ranking windowing functions is simply to get Batch Mode involved. Where that’s not possible, you may have to resort to adding indexes.

Sometimes, even with Batch Mode, there is additional work to be done, but it really does get a lot of the job done.

In this post I’m going to cover some of the complexities of indexing for ranking windowing functions when there are additional considerations for indexing, like join and where clause predicates.

Click through for an in-depth article with plenty of good information.

Comments closed

Checking for the Existence of Multiple Values in SQL

Lukas Eder does a performance test:

But what if you want to check if there are at least 2 (or N) rows? In that case, you cannot use EXISTS, but have to revert to using COUNT(*). However, instead of just counting all matches, why not add a LIMIT clause as well? So, if you want to check if actors called WAHLBERG have played in at least 2 films, instead of this:

Lukas compares performance between two query options in Postgres, Oracle, SQL Server, and MySQL. Because Oracle has weird behavior in the test, Lukas shares a third option that works well for it.

Comments closed

Indexing for Sorted Data

Erik Darling digs into indexes and sorts:

Without things sorted the way you’re looking for them, it’s a lot like hitting shuffle until you get to the song you want. Who knows when you’ll find it, or how many clicks it will take to get there.

The longer your playlist is, well, you get the idea. And people get all excited about Skip Scans. Sheesh.

Anyway, let’s look at poor optimizer choices, and save the poor playlist choices for another day.

Click through for an interesting problem and solution.

Comments closed

Avoiding Time Intelligence DAX Functions in DirectQuery Mode

Marco Russo and Alberto Ferrari skip the slow stuff:

Calculations that use the DAX time intelligence functions mostly retrieve data at the day level, performing the required aggregations in the formula engine. By avoiding time intelligence DAX functions, you can force DAX to produce more optimized queries for your specific calculations.

DirectQuery over SQL and VertiPaq require the same patterns to optimize time intelligence calculations, even though the reasons are different. In VertiPaq, we try to stay away from DAX time intelligence functions to avoid large materialization at the day level. With SQL, materialization does not always happen because Tabular tries to push the grouping down to SQL. Still, time intelligence calculations often result in complex queries, and it is better to avoid the complexity by using simpler DAX code.

Check out the performance difference.

2 Comments

Troubleshooting a Slow Deletion

Aaron Bertrand has an admission:

Before looking at the code path, the query, or the execution plan, I didn’t even believe the application would regularly perform a hard delete. Teams typically soft delete “expensive” things that are ever-growing (e.g., change an IsActive column from 1 to 0). Deleting a user is bound to be expensive, because there are usually many inbound foreign keys that have to be validated for the delete to succeed. Also, every index has to be updated as part of the operation. On top of that, there are often triggers that fire on delete.

While I know that we do sometimes soft delete users, the engineer assured me that the application does, in some cases, hard delete users.

Click through for the full story and a minor bout of self-petard-hosting. I’m as guilty as anyone else of jumping to conclusions, and this is a good reminder to go through the process even when you think you know the answer.

Comments closed

Weirdness with Aggregation

Erik Darling digs into a problem. Part 1 sets up the scenario:

Here’s the query plan, which yes, you’re reading correctly, runs for ~23 seconds, fully joining both tables prior to doing the final aggregation.

I’m showing you a little extra here, because there are missing index requests that the optimizer asks for, but we’ll talk about those in tomorrow’s post.

The wait stats for this query, since it’s running in Batch Mode, are predictably HT-related.

Part 2 covers those missing indexes:

I’ve taken a small bit of artistic license with them.

The crappy thing is… They really do not help and in some cases things get substantially worse.

Maybe it’s because it’s early and I’m trying to compile things in my head rather than actually trying it out, but it seems like a combo of CTE + CROSS APPLY or a pair of CROSS APPLY statements could work better (especially with a good index), assuming that join doesn’t need to be in place. Given the query as it is, with two MAX() aggregations and no GROUP BY clause, that could be an avenue for improvement, though one I have not actually tested. Nonetheless, read both of Erik’s posts.

Comments closed