Press "Enter" to skip to content

Category: T-SQL

Rewriting Tricky Functions in SQL Server

Erik Darling fights dragons:

Far and away, some of the trickiest situations I run into when helping clients is rewriting scalar functions that have WHILE loops in them.

This sort of procedural code is often difficult, but not impossible, to replace with set-based logic.

Erik improves a function in this post, though often, the best way to improve a function is not to play the game at all.

Comments closed

Updates with Nested REPLACE Functions

Chad Callihan neeeds to make an update:

Have you ever ran into a situation where you needed to replace both ends of a string? Maybe it was tags in a set of XML values or phrases at the beginning and end. I came across that situation recently and was able to show if two statements were needed or if REPLACE could be used against the same column in the same statement. Let’s look at a quick example and see what’s possible.

Chad’s solution is one I use fairly frequently. I agree that it’s not pretty, though one thing I like to do is tokenize code, so I might have something that looks like:

-- Procedure input parameters here
@FirstName NVARCHAR(150),
@FavoriteColor NVARCHAR(30)

-- Guts of procedure here
DECLARE @msg NVARCHAR(4000) = N'Hello, {FIRST_NAME}!  Your favorite color is {FAVORITE_COLOR}!';

SELECT
    REPLACE(REPLACE(@msg,
        '{FIRST_NAME}', @FirstName),
        '{FAVORITE_COLOR}, @FavoriteColor);

It’s definitely not the prettiest but when you have several tokens to replace, it’s a lot easier to read than building the string all at once.

Comments closed

Generating SQL Code from Metadata

Richard Swinbank shows off one of the more common uses of dynamic SQL:

Now you come to ingest your second table – and you have to do this all again!

It’s a lot of code, and worse, it’s all basically the same – only the names and types are any different. You can easily make a mistake, and the boredom of repetition only makes that more likely. Later on, when you decide you want to add something to the process – a logging statement at the end of the stored proc, for example – you’ll have to add it separately to every stored procedure you’ve created.

The solution? Generate your code automatically, from metadata, using dynamic SQL! 

Read on for what you’d need to pull this off.

Comments closed

Generating Code to Run Across All Databases via Dynamic SQL

Aaron Bertrand provides a warning around dynamic SQL:

For October’s T-SQL Tuesday, Steve Jones asks us to talk about ways we’ve used dynamic SQL to solve problems. Dynamic SQL gets a bad rap simply because, like a lot of tools, it can be abused. It doesn’t help that a lot of code samples out there show that “good enough” doesn’t meet the bar most of us have, especially in terms of security.

In a series I started last year, I talked about ways to do <X> to every <Y> inside a database, focusing on the example of refreshing every view (in a single database or across all databases). I already touched on what I want to dig into today: that it can be dangerous to try to parameterize things that can’t be parameterized in the ways people typically try.

Read the whole thing. I do find it funny how often people aren’t allowed to install well-known, third-party stored procedures (like Aaron’s sp_ineachdb) but it’s perfectly okay to write terrible code which is vulnerable to exploit because it was written in-house and is therefore more trustworthy somehow.

I don’t want to dunk on security teams too much in this regard, as I understand and really do appreciate the principle, though it often has counterintuitive first- and second-order consequences.

Comments closed

Merge Joins in SQL Server

Jared Poche continues a series on join types:

Merge joins traverse both inputs once, advancing a row at a time and comparing the values from each input. Since they are in the same order, this is very efficient. We don’t have to pay the cost to create a hash table, and we don’t have the much larger number of index seeks nested loops would encounter.

Read the whole thing. Remember: merge joins are also the best strategy for when two lanes of the road come together.

Comments closed

Implicit Conversion of DATEDIFF

Daniel Hutmacher noticed a problem:

As I was performance tuning a query, I found that a number of date calculation functions in SQL Server appear to be forcing a conversion of their date parameters to a specific datatype, adding computational work to a query that uses them. In programming terms, it seems that these functions do not have “overloads”, i.e. different code paths depending on the incoming datatype.

So let’s take a closer look at how this manifests itself.

Some of these results are pretty annoying, especially because I like to use DATETIME2(0) for the large majority of development work

Comments closed

Column Lookups and Dynamic SQL

Rob Farley does a double-check:

I’ve written before about what I consider the golden rule with SQL Injection. And that is, to understand that DATA should not be executed like CODE. A parameter is fine – that’s not part of what gets executed – but you do not embed user-entered values into queries. You never create a WHERE clause like “WHERE t.col1 = 3”, if that value 3 is being passed in. Instead, you use “WHERE t.col1 = @param”. Because you can’t assume that “3” is safe to run.

But there are things in queries that can’t be parameterised. Table names, linked server names, column names, for example.

Read on to learn what Rob does in those cases.

Comments closed

When Batching Breaks Down

Erik Darling has some thoughts on batching data modifications:

The whole point is to go a little bit easier on your server:

– Smaller transactions sent to the log file

– Shorter locking durations

– Less build up of the version store (optimistic isolation levels, triggers, etc.)

But there are thing that make batching impossible, or downright tricky.

Erik provides more details after the jump. This is all “stuff you should think about” rather than “reasons why you shouldn’t batch.”

Comments closed

Tuning a Range Query

Grant Fritchey gets forum-sniped:

Recently I was looking through DBA.StackExchange when I saw a pretty simple question that I decided to answer. I went off, set up a test database and some test tables and quickly wrote a query to answer the question. I got it all formatted pretty and was on my way to post it when I saw that another answer was already there.

Yeah. Identical to mine. Almost line for line.

Well, nuts.

I know. I’ll write a blog post.

In thinking about the problem, the thing that caught my mind was Grant’s comment about poor design. This got me thinking about one of my favorite topics: orthogonal design for relational excellence. The idea of a BETWEEN table of [ MinValue : MaxValue ] is the first thing people think of but is also the worst because you have two big problems: gaps and overlap.

The second solution is to use MinValue and calculate MaxValue (if we actually need it) as LEAD(MinValue) OVER (ORDER BY MinValue) - e, where e represents the smallest reasonable increment we’d need. Queries would find, for each Value in the main table, the largest MinValue below Value. That removes gaps and overlap but might be a performance concern as the main table’s data size grows.

The big-brain solution, which generally works best when you have a discrete number of options, could be a tally table. In Grant’s example, we see values from 1 to 1000, with a rank for each. If it’s really as simple as that, we would create a new lookup table with Value + RankDesc and simply join the main table’s Value to the lookup table’s Value to get the appropriate RankDesc. Yeah, you have 1000 rows instead of 3 but queries are trivial at that point. The downside is that this approach doesn’t work for continuous variables (e.g., give me the exact amount of your household income for the prior tax year) and the utility of this solution probably breaks down once you get past tens of thousands of rows.

In the case of a continuous variable or an enormous discrete variable, we have the simplest option of all: ignore something. If you care about the range, use the table from the second solution and use that ID on the main table. If you care about the value but not the range, just have the value and no lookup table.

Comments closed