Press "Enter" to skip to content

Category: Query Tuning

Building Sets with GENERATE_SERIES

Aaron Bertrand is a fan of this new function:

I have come across a lot of use cases for manufacturing rows on the fly, aside from the common goal of populating a large data set such as a numbers or calendar table. A few favorites include building sample data, pivoting an unknown number of columns, data extrapolation, and filling gaps in date or time ranges.

If you are on SQL Server 2022 or Azure SQL Database, or have been reading up on new features, you’ve likely heard about one of the better T-SQL enhancements: a new built-in function called GENERATE_SERIES. The syntax is straightforward – it accepts arguments for start and stop, and an optional argument to indicate step (in case you want to iterate by more than 1, or backwards):

Click through to see how performance for this compares to two methods we’ve used in the past to generate similar results.

Comments closed

Overwhelming the IN Operator

Tariq Rasheed Al-Qaralleh recounts a customer problem:

In some cases, like when customers use Object-relational Mapping tools (ORM) Like Entity framework or LINQ, part of the code at the end will be converted to a TSQL executable statement.

For example, The LINQ with  .Where () method will be a TSQL Query with a Where clause :

Query Syntax and Method Syntax in LINQ (C#) | Microsoft Learn

Write LINQ queries in C# | Microsoft Learn

Read on for some of the practical consequences of doing this, including performance issues and possibly even runtime errors.

Tariq gives a couple examples of how to fix the issue, and a third possible fix is to pass in the IN clause as a table-valued parameter and join to that TVP.

Comments closed

Rewriting Scalar UDFs and NULL Results

Erik Darling jumps out of the time machine to warn us, but we have no idea what he’s talking about so we ignore the warning and end up doing the thing he warned us not to do, causing us to need to send him into a time machine to warn us not to do it:

I think I have probably spent 500 hours of my life rewriting T-SQL Scalar UDFs to avoid all the performance problems associated with them.

The obvious choice is the Inline Table Valued Function, which has fewer performance issues baked in. For the kids out there: they don’t spill trauma.

But getting the rewrite right can be tricky, especially when it’s possible for the function to return NULL values.

Click through for the example.

Comments closed

Performance Tuning a Dedicated SQL Pool

Sarath Sasidharan has some guidance for us:

Synapse Dedicated pools have been battle tested at enterprise customers across the globe. We deal with data in the magnitude of PetaBytes. Synapse can provide you with the scale of the cloud and the high performance required for your enterprise-grade requirements.  The key to maximizing your performance is to follow best practices, check out best practices for dedicated SQL pools in Azure Synapse Analytics

Failure to do so causes performance issues. In such scenarios, is it important to understand where the bottlenecks are. This blog focuses on the different steps a query goes through; from the time the query is fired from the client until it returns back.  Delay caused in any of the steps would impact the overall run-time of the query and hence indicate degraded performance.

Click through for a walkthrough of each step along the way, potential problems you could run into, and remediations for those problems. Much of the advice is similar to what you’d get with SQL Server, though there are differences interspersed throughout each level.

Comments closed

Speeding Up Queries via IF EXISTS

Chad Callihan doesn’t need to wait for the query’s end credits sequence:

When checking for the existence of a value or values in a table, typically, the whole table does not need to be read. The goal is to obtain more of a true or false answer, whether a criteria is met or not. If the criteria is met with the first few records then there’s no need to read on. Nonetheless, you may come across scripts written to unnecessarily retrieve the complete count of a value in a table. Let’s compare the performance difference between using COUNT(*) and using “IF EXISTS” when checking a table for values.

One’s going to give you a full scan and the other will give you a semi-join. Read on to see what the practical effect of this is.

Comments closed

Aggregations and Distinct Counts in Power BI

Phil Seamark doesn’t have time to wait for Power BI to count:

This article aims to show how you can speed up distinct count calculations in Power BI using the built-in user-defined aggregations feature. The user-defined aggregation feature in Power BI is designed to work with direct query models and usually gets used for calculations such as SUM, MIN, MAX etc. However, it can also work well for distinct count calculations using the pattern shown in this article.

It’s an interesting partial aggregation approach which works really well when the distinct count is a small percentage of the total.

Comments closed

FILTER vs CASE in Postgres

Lukas Eder hits on an interesting data point:

I’ve found an interesting question on Twitter, recently. Is there any performance impact of using FILTER in SQL (PostgreSQL, specifically), or is it just syntax sugar for a CASE expression in an aggregate function?

Click through for the answer and your daily reminder that SQL variants aren’t pure fourth generation languages—if they were, the optimizer would take all possible constructs of a given desired operation and generate the same execution plan for all of them.

Comments closed

Minimizing Cross-Highlighting on Direct Query

Dany Hoter explains why cross filtering might not be the best when you are using Direct Query in Power BI:

Every time a user clicks on a row or a column in a visual, every other visual in the page feels an urge to refresh itself and respond to the click.

Visual that are based on direct query may issue multiple queries for each refresh.

Depending on the number of measures used in the visual, the number of generated queries can be as high as 10 or even more for a single visual.

Read on for more information and what you can do to avoid this problem.

Comments closed

Performance Comparing DISTINCT to GROUP BY

Reitse Eskens does a performance comparison:

A few days ago, I heard someone stating that Group By was much quicker than Distinct. Less disk impact, less memory etc.
So, I thought I’d find out if it’s true or not because I found it interesting. I always thought there was no difference. I tested a single small table and found no difference in speed, reads or execution plan. But that’s no real world example. Usually the tables contain a lot of data and are joined to other tables.

Click through for the results of Reitse’s analysis.

Comments closed

SQL Server 2022 Query Store Hints

Matthew McGiffen takes a hint:

Another neat little feature in SQL Server 2022 is Query Store Hints. This is the ability to apply a query hint through Query Store rather than having to modify existing code or fiddle around with plan guides.


Query hints are a way to influence optimizer behaviour towards generating desired execution plans for a given query. The word “hint” is a bit of a misnomer as usually they mandate what you wish to happen.

Right. They’re ‘hints’ in the way that my wife ‘hints’ that I should take out the garbage.

Comments closed