Press "Enter" to skip to content

Category: Query Tuning

Manual Halloween Problem Protection

Jared Poche takes us through Halloween problem protection and builds out his own method, with blackjack and hookers:

Well, if SQL Server is trying to separate the read from the write, why don’t I just do that myself? I had the idea to read the data I needed in an INSERT…SELECT statement, writing into a memory-optimized table variable (motv). I could make sure the read included all the columns I need to calculate the new value, including the CASE statement I had in the SET clause of the UPDATE.

Click through for Jared’s explanation of how to implement it, as well as the circumstances in which it might be faster than what you get by letting a single T-SQL statement handle the job.

Comments closed

Fixing the Parallelism Documentation

Erik Darling shreds the docs:

The section with the weirdest errors and omissions is right up at the top. I’m going to post a screenshot of it, because I don’t want the text to appear here in a searchable format.

That might lead people not reading thoroughly to think that I condone any of it, when I don’t.

Erik pulls no punches on this post. Hopefully the end result is that this part of the documentation improves.

Comments closed

Optimizing for Mediocre

Erik Darling is always optimal:

Using the OPTIMIZE FOR UNKNOWN hint, or declaring variables inside of a code block to be used in a where clause have the same issue, though: they make SQL Server’s query optimizer make bad guesses, which often lead to bad execution plans.

You can read great detail about that here.

Read on for a bit of a deserved rant and an example to show why OPTIMIZE FOR UNKNOWN often doesn’t solve the problem.

Comments closed

Pattern Searches on Numbers

Erik Darling does an odd search:

Most of the time, it’s some query that looks like this:

SELECT

c = COUNT_BIG(*)

FROM dbo.Posts AS p

WHERE p.OwnerUserId LIKE ‘2265%’

The cruddy thing here is that… SQL Server doesn’t handle this well.

At first, my thought was, “Why not just use an inequality operator like p.OwnerUserID >= 22650000 AND p.OwnerUserID < 22660000 but then I remembered that we’re probably dealing with multiple orders of magnitude here and the whole thing made my head hurt a little.

Comments closed

Building Sets with GENERATE_SERIES

Aaron Bertrand is a fan of this new function:

I have come across a lot of use cases for manufacturing rows on the fly, aside from the common goal of populating a large data set such as a numbers or calendar table. A few favorites include building sample data, pivoting an unknown number of columns, data extrapolation, and filling gaps in date or time ranges.

If you are on SQL Server 2022 or Azure SQL Database, or have been reading up on new features, you’ve likely heard about one of the better T-SQL enhancements: a new built-in function called GENERATE_SERIES. The syntax is straightforward – it accepts arguments for start and stop, and an optional argument to indicate step (in case you want to iterate by more than 1, or backwards):

Click through to see how performance for this compares to two methods we’ve used in the past to generate similar results.

Comments closed

Overwhelming the IN Operator

Tariq Rasheed Al-Qaralleh recounts a customer problem:

In some cases, like when customers use Object-relational Mapping tools (ORM) Like Entity framework or LINQ, part of the code at the end will be converted to a TSQL executable statement.

For example, The LINQ with  .Where () method will be a TSQL Query with a Where clause :

Query Syntax and Method Syntax in LINQ (C#) | Microsoft Learn

Write LINQ queries in C# | Microsoft Learn

Read on for some of the practical consequences of doing this, including performance issues and possibly even runtime errors.

Tariq gives a couple examples of how to fix the issue, and a third possible fix is to pass in the IN clause as a table-valued parameter and join to that TVP.

Comments closed

Rewriting Scalar UDFs and NULL Results

Erik Darling jumps out of the time machine to warn us, but we have no idea what he’s talking about so we ignore the warning and end up doing the thing he warned us not to do, causing us to need to send him into a time machine to warn us not to do it:

I think I have probably spent 500 hours of my life rewriting T-SQL Scalar UDFs to avoid all the performance problems associated with them.

The obvious choice is the Inline Table Valued Function, which has fewer performance issues baked in. For the kids out there: they don’t spill trauma.

But getting the rewrite right can be tricky, especially when it’s possible for the function to return NULL values.

Click through for the example.

Comments closed

Performance Tuning a Dedicated SQL Pool

Sarath Sasidharan has some guidance for us:

Synapse Dedicated pools have been battle tested at enterprise customers across the globe. We deal with data in the magnitude of PetaBytes. Synapse can provide you with the scale of the cloud and the high performance required for your enterprise-grade requirements.  The key to maximizing your performance is to follow best practices, check out best practices for dedicated SQL pools in Azure Synapse Analytics

Failure to do so causes performance issues. In such scenarios, is it important to understand where the bottlenecks are. This blog focuses on the different steps a query goes through; from the time the query is fired from the client until it returns back.  Delay caused in any of the steps would impact the overall run-time of the query and hence indicate degraded performance.

Click through for a walkthrough of each step along the way, potential problems you could run into, and remediations for those problems. Much of the advice is similar to what you’d get with SQL Server, though there are differences interspersed throughout each level.

Comments closed

Speeding Up Queries via IF EXISTS

Chad Callihan doesn’t need to wait for the query’s end credits sequence:

When checking for the existence of a value or values in a table, typically, the whole table does not need to be read. The goal is to obtain more of a true or false answer, whether a criteria is met or not. If the criteria is met with the first few records then there’s no need to read on. Nonetheless, you may come across scripts written to unnecessarily retrieve the complete count of a value in a table. Let’s compare the performance difference between using COUNT(*) and using “IF EXISTS” when checking a table for values.

One’s going to give you a full scan and the other will give you a semi-join. Read on to see what the practical effect of this is.

Comments closed

Aggregations and Distinct Counts in Power BI

Phil Seamark doesn’t have time to wait for Power BI to count:

This article aims to show how you can speed up distinct count calculations in Power BI using the built-in user-defined aggregations feature. The user-defined aggregation feature in Power BI is designed to work with direct query models and usually gets used for calculations such as SUM, MIN, MAX etc. However, it can also work well for distinct count calculations using the pattern shown in this article.

It’s an interesting partial aggregation approach which works really well when the distinct count is a small percentage of the total.

Comments closed