Press "Enter" to skip to content

Category: T-SQL

Fun with Function Rewrites

Erik Darling reminds me why I hate user-defined functions in SQL Server:

At 23 seconds, this is probably unacceptable. And this is on SQL Server 2019, too. The function inlining thing doesn’t quite help us, here.

One feature restriction is this, so we uh… Yeah.

The UDF does not contain aggregate functions being passed as parameters to a scalar UDF

But we’re probably good query tuners, and we know we can write inline functions.

Read the whole thing, as this is not always straightforward.

Comments closed

Optimizing Multiple CTEs

Itzik Ben-Gan continues a series on table expressions:

Last month I explained and demonstrated that CTEs get unnested, whereas temporary tables and table variables actually persist data. I provided recommendations in terms of when it makes sense to use CTEs versus when it makes sense to use temporary objects from a query performance standpoint. But there’s another important aspect of CTE optimization, or physical processing, to consider beyond the solution’s performance—how multiple references to the CTE from an outer query are handled. It’s important to realize that if you have an outer query with multiple references to the same CTE, each gets unnested separately. If you have nondeterministic calculations in the CTE’s inner query, those calculations can have different results in the different references.

Say for instance that you invoke the SYSDATETIME function in a CTE’s inner query, creating a result column called dt. Generally, assuming no change in the inputs, a built-in function is evaluated once per query and reference, irrespective of the number of rows involved. If you refer to the CTE only once from an outer query, but interact with the dt column multiple times, all references are supposed to represent the same function evaluation and return the same values. However, if you refer to the CTE multiple times in the outer query, be it with multiple subqueries referring to the CTE or a join between multiple instances of the same CTE (say aliased as C1 and C2), the references to C1.dt and C2.dt represent different evaluations of the underlying expression and could result in different values.

Definitely worth the read.

Comments closed

String Concatenation in SQL Server

Guy Glantser hits on a pain point in SQL Server when dealing with long strings:

Now, let’s talk about concatenation. What do you think would be the data type of the following expression?

N’ABCD’ + N’EFG’

Correct! It’s NVARCHAR(7). Everything is making sense. Isn’t it great?

Now, let’s complicate things just a little bit. Suppose you have an expression that is a concatenation of two string literals – one of them contains 3,000 characters and the other contains 2,000 characters.

Guy also has a function to print beyond 4000 Unicode characters:

Sometimes, you want to print a long string. For example, you might want to print the definition of a long stored procedure. Or you might have a very long dynamic batch that you are going to execute, but you want to print it first for debug purposes.

The problem with the PRINT statement is not only that it prints up to the first 8,000 bytes. It also truncates your text without even generating a warning.

This is a long-running frustration of mine, especially when writing out complicated dynamic SQL. I think PRINT should have been changed 15 years ago to handle MAX types.

Comments closed

From JSON to SQL Server

Phil Factor has some helper functions for us when working with JSON data:

If you know the structure and contents of a JSON document, then it is possible to turn this into one or more relational tables, but even then I dare to you claim that it is easy to tap in a good OpenJSON SELECT statement to do it. If you don’t know what’s in that JSON file, then you’re faced with sweating over a text editor trying to work it all out. You long to just get the contents into a relational table and take it on from there. Even then, You’ve got several struggles before that table appears in the result pane. You must get the path to the tabular data correct, you have to work out the SQL Datatypes, and you need to list the full panoply of keys. Let’s face it: it is a chore. Hopefully, all that is in the past with these helper functions.

Click through for those functions.

Comments closed

Retrieving Text Between Delimiters

Erik Darling takes us through the seedy underbelly of T-SQL:

I have to do this a fair amount, and I always go look at this little cheat sheet that I have.

Then it occurred to me that it might be worth sharing the details here, in case anyone else runs into the same need.

The way I learned to do it is with SUBSTRING and CHARINDEX, which is a pretty common method.

That this is possible is great, but it’d be nice to have an easier approach. Thinking through that easier approach is outside the scope of this post…

Comments closed

More Fun with NULL

Chris Johnson troubleshoots an issue in code:

The poster had a CASE statement and was wondering why it didn’t work as expected. Essentially they were doing something like:

CASE WHEN @a = @b OR (@a IS NULL AND @b IS NULL) THEN 1 ELSE 0
CASE WHEN NOT(@a = @b OR (@a IS NULL AND @b IS NULL)) THEN 1 ELSE 0

And they wanted to know why both were returning 0 when @a or @b were set to NULL. The issue here is that any normal predicate involving NULL returns an unknown. They had tried to compensate with the OR, which got them the result they wanted in the first statement, but didn’t understand why it did that.

Click through for the explanation.

Comments closed

Altering User-Defined Table Types

Michael J. Swart has a clever solution to the inability to alter user-defined table types:

Last year, Aaron Bertrand tackled the question, How To Alter User Defined Table Types. Aaron points out that “There is no ALTER TYPE, and you can’t drop and re-create a type that is in use”. Aaron’s suggestion was to create a new type and then update all procedure to use the new type.

I think I’ve got a bit of improvement based on sp_rename and sp_refreshmodule

This is a clever solution. Prior to it, my workflow was:

  1. Create a new user-defined table type
  2. Create new stored procedures which reference the new user-defined table type
  3. Alter and deploy code to call these new stored procedures
  4. Drop the old procedure and user-defined table type

If the changes are such that they don’t require immediate app changes to use (for example, adding a nullable column to the UDTT), this can save a lot of effort.

Comments closed

Optimizing Common Table Expressions

Itzik Ben-Gan continues a series on common table expressions:

If you’re wondering why not use a much simpler solution with a grouped query and a HAVING filter, it has to do with the density of the shipperid column. The Orders table has 1,000,000 orders, and the shipments of those orders were handled by five shippers, meaning that in average, each shipper handled 20% of the orders. The plan for a grouped query computing the maximum order date per shipper would scan all 1,000,000 rows, resulting in thousands of page reads. Indeed, if you highlight just the CTE’s inner query (we’ll call it Query 3) computing the maximum order date per shipper and check its execution plan, you will get the plan shown in Figure 3.

Read on for classic Itzik.

Comments closed

ANSI String Comparison

Greg Dodd takes us through one of the oddities of ANSI string comparison:

So let’s play a game, what will the output be if I pass in the following? I’ve included my guesses

EXEC AreStringsTheSame N'Greg', N'Greg' --Yes
EXEC AreStringsTheSame N'Greg', N'Dodd' --No
EXEC AreStringsTheSame N'', N'' --Yes
EXEC AreStringsTheSame N' ', N' ' --Yes
EXEC AreStringsTheSame N' ', N'' --No
EXEC AreStringsTheSame N' Greg', 'Greg' --No, leading space
EXEC AreStringsTheSame N'Greg ', 'Greg' --No, trailing space

Read on to see what Greg ended up guessing wrong and why.

Comments closed

Efficient Polling of Remote Data Sources

Ed Pollack looks at ways of optimizing linked server connections to remote SQL Server and Postgres instances:

Data rarely resides in one place. Oftentimes, there is a need to collect data from many different data sources and combine them locally into meaningful reports, analytics, or tables. The process for accessing, collecting, validating, and using data from remote data sources requires the same level of design and architectural considerations as building data structures for a software application.

In this article, methods of accessing remote data will be introduced with the goal of presenting best practices and ways to optimize load processes. This will lead into a discussion of performance and how to avoid the latency often associated with loading data from remote locations.

Click through for the article.

Comments closed