Press "Enter" to skip to content

Category: T-SQL

T-SQL Improvements in CTP 2.1

Itzik Ben-Gan looks at ways recent T-SQL improvements:

One of the complexities that is of a specific importance to this section is comparisons that potentially involve NULL comparands, such as ones that you use in filter and join predicates. Most operators that you use in such comparisons, including the equals (=) and different than (<>) operators, use three-valued logic. This means that there are three possible truth values as a result of a predicate that uses such operators: true, false and unknown. When both comparands are non-NULL, such operators return true or false as you would intuitively expect. When any of the comparands is NULL, including when both are NULL, such comparisons return the unknown truth value. Both filter and join predicates consider unknown as a non-match, and sometimes that’s not the behavior that you’re after.

The distinct predicate (IS [NOT] DISTINCT FROM) should simplify a lot of code in the wild.

Comments closed

Data Quality Checks in Power BI

Kristyna Hughes wants to match up data:

Picture this, you have a report in Power BI that someone passes off to you for data quality checks. There are a few ways to make sure your measures match what is in the source data system, but for this demo we are going to use python and excel to perform our data quality checks in one batch. In order to do that, we are going to build a python script that can run Power BI REST APIs, connect to a SQL Server, and connect to Excel to grab the formulas and to push back the quality check into Excel for final review. To find a sample Excel and the final python script, please refer to my GitHub.

Check out the GitHub repo as well as Kristyna’s very detailed walkthrough.

Comments closed

The Cure to Scalar Functions

Tom Zika has a cure:

In the first two parts, we have seen why the Scalar functions (UDFs) are a problem for the performance. So how do we deal with it now that we know it’s a problem?

There is only one solution:

I say we take off and nuke the entire site from orbit. It’s the only way to be sure.
— Ellen Ripley

Tom’s ideas are intriguing to me and I wish to subscribe to his newsletter.

Comments closed

CETAS and the Serverless SQL Pool

Liliam Leme takes us through the Create External Table as Select operation in the Azure Synapse Analytics serverless SQL pool:

Serverless SQL pool  has a very cool way to handle CREATE EXTERNAL TABLE AS SELECT (CETAS).  I mean, you can use a serverless SQL pool to create a CETAS which will materialize the query results. It means a heavy query in which, for example, the results would be part of future large join with other queries or aggregations that can be consolidated for reporting purposes. Those scenarios are examples that could be used as a CETAS. CETAS will be stored as a parquet file accessed by an external table in your storage and the performance is awesome. Later I am suggesting how to recreate the CETAS using pipeline against serverless SQL Pool.

Click through to see it in action.

Comments closed

Deleting from Multiple Tables by Key

Guy Glantser needs to delete some data:

Sometimes you need to delete data from multiple tables in a database. For example, you might have a multi-tenant database, and you need to delete all the data that belong to some tenant.

The problem is that there are many tables in the database that contain data, which you need to delete. If you have a column like “TenantId” in all tables, then your life is easier, because you have a simple predicate to apply to your DELETE statements against all tables. But even then, if there are foreign keys between tables to enforce referential integrity, then things get more complicated.

Read through for Guy’s answer, which definitely works and can be the quickest solution. If you can’t drop foreign key constraints (even temporarily), I have a post from a while back on tracing foreign keys to “levels.” The post only covers finding the ordering but could be extended to delete data one level at a time.

Comments closed

Using the WINDOW Clause in SQL Server 2022

Hasan Savran peeks through the windows:

WINDOW Operations in SQL Server can be hard to understand.  I believe one of the main reasons for that is the long and repeated code it needs. SELECT…WINDOW Clause will help us to remove repeated code and hopefully it will make the WINDOW Operations more user-friendly.

I am really happy about this syntax change, as often times, we have to re-use the same window frame (PARTITION BY and ORDER BY) or a base frame with a minor change (add one more column to the end). This removes the kind of repetition which makes queries harder to read and introduces subtle bugs.

Comments closed

Creating Human-Readable Intervals from Lists in T-SQL

Daniel Hutmacher is making a list and checking it twice:

If you’ve worked with reporting, you’ve probably come across the following problem. You have a list of values, say “A, B, C, D, K, L, M, N, R, S, T, U, Z” that you want to display in a more user-friendly, condensed manner, “A-D, K-N, R-U, Z”.

Today, we’re going to look at how you can accomplish this in T-SQL, and what this has to do with window functions and gaps and islands.

Read on for a really good demonstration of how powerful window functions and STRING_AGG() can be.

Comments closed

Parameter Sensitive Plan Optimization with Branches and Local Variables

Erik Darling has some mixed news. First up, if you branch a lot:

I’ve spent a bit of time talking about how IF branches can break query performance really badly in SQL Server.

While the Parameter Sensitive Plan (PSP) optimization won’t fix every problem with this lazy coding habit, it can fix some of them in very specific circumstances, assuming:

– The parameter is eligible for PSP

– The parameter is present across IF branches

Less sanguine news if you use local variables a lot:

One fix I’ve been wishing for, or wish I’ve been fixing for, is a cure for local variables. I’d even be cool if Forced Parameterization was that cure, but you know…

Time will tell.

Though I prefer to call local variables an “Optimize for mediocre” plan hint.

Comments closed

The Performance Pain of User-Defined Functions

Tom Zika continues a series on why user-defined scalar functions are such a bad idea:

I’ll cover several test scenarios and analyze the performance using different monitoring tools.
The results will be for the second executions of the queries, so we have compiled and cached plans and all pages in the buffer pool.

Even if you already hate seeing scalar user-defined functions in code, the occasional reminder of how poorly they perform helps focus the mind.

Comments closed