Press "Enter" to skip to content

Category: T-SQL

Comparing Nullable Columns

Daniel Hutmacher shows an elegant way to compare multiple nullable columns on two tables:

Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.

Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.

This is an interesting use of INTERSECT.  Check it out.

Comments closed

Handling Delimited Strings

Ed Pollack looks at several methods of creating delimited strings, and then several methods of breaking out delimited strings:

The execution plan is cut off, but you can be assured that there are six more similar plans below the ones pictured here. These metrics are misleading as each loop doesn’t seem too bad, right? Just 9% of the subtree cost or a few hundred reads doesn’t seem too wild, but add up all of these costs and it becomes clear that this won’t scale. What if we had thousands of rows to iterate through? For 5,000 rows, we would be looking at about 147,995,000 reads! Not to mention a very, very long execution plan that is certain to make Management Studio crawl as it renders five thousand execution plans.

Alternatively, we could cache all of the data in a temp table first, and then pull it row-by-row. This would result in significantly less reads on the underlying sales data, outperforming cursors by a mile, but would still involve iterating through the temp table over and over. For the scenario of 5,000 rows, we’d still have an inefficient slog through a smaller data set, rather than crawling through lots of data. Regardless of method, it’s still navigating quicksand either way, with varying amounts of quicksand.

This is a good article and includes STRING_SPLIT and OPENJSON methods in SQL Server 2016, but does not include building your own CLR methods.  Check it out if you need to do any delimited string manipulation.

Comments closed

Unpivoting With APPLY

I have a post on using the APPLY operator to unpivot data:

This code works, but if you have dozens of years, it gets messy writing those case statements and you’re a bit more likely to make a mistake when refactoring code. Here’s a simpler version using CROSS APPLY:

1
2
3
4
5
6
7
8
9
10
11
SELECT
    s.Product,
    y.[Year],
    y.Quantity,
    y.[Value]
FROM #Sales s
    CROSS APPLY (VALUES
        (2013, [Qty2013], [Val2013]),
        (2014, [Qty2014], [Val2014]),
        (2015, [Qty2015], [Val2015])
    ) y([Year], Quantity, [Value]);

It’s a little easier to read than the other version, and adding additional years is pretty straightforward.  That makes for a great tip when you’re trying to refactor poorly-thought-out tables or bring into your system potentially well-thought-out flat files.

APPLY is an elegant solution to so many different classes of problem.

Comments closed

Finding Dates

Derik Hammer shows the right way and the wrong way of using date functions in a WHERE clause:

I then changed my thought process to find the age of a 65 year old who’s birth day is today. Then I compare the DateOfBirth column to that static value.

SELECT COUNT(*) FROM dbo.Person p WHERE DateOfBirth <= DATEADD(YEAR,-65,GETDATE())

With the above query I bought myself an index seek and 345 logical reads. That works out to <3% of the cost.

This is true not just for date functions, but rather is applicable to almost all scalar functions.

Comments closed

CASE Statements In GROUP BY Clauses

Grant Fritchey looks at CASE statements within GROUP BY clauses:

The same basic set of structures, scans against both tables, to arrive at the data. Cost estimates between the two plans are very different though, with the targeted queries having a much lower estimated cost.

Performance-wise, interestingly enough, the average execution time of the first query, only returning the 10 rows, is 157ms on average, while the query grouping directly on the SalesPersonID averages about 190ms. Now, the reads tell a slightly different story with 17428 on the generic query and 5721 on the specific query. So, maybe a server under load will see a significant performance increase. However, let’s deal with what we have in front of us and say that, at least for these tests, the catch-all GROUP BY query performs well.

Grant’s recommendation is to split this out into several procedures, and if you’re having performance problems, that’s a solid move.  I’m a bit more likely to keep them (especially in warehousing reports), but it’s nice to have options.

Comments closed

LAST_VALUE

Steve Jones plays with a window function new to SQL Server 2012:

The important thing to understand with window functions is that there is a frame at any point in time when the data is being scanned or processed. I’m not sure what the best term to use is.

Let’s look at the same data set Kathi used. For simplicity, I’ll use a few images of her dataset, but I’ll examine the SalesOrderID. I think that can be easier than looking at the amounts.

Here’s the base dataset for two customers, separated by CustomerID and ordered by the OrderDate. I’ve included amount, but it’s really not important.

Steve goes into detail and explains what’s going on each step of the way.  Window functions are extremely useful; check them out if you’re not already familiar with them.

Comments closed

How Do You Respond?

Erik Darling has a new interview question:

A new developer has been troubleshooting a sometimes-slow stored procedure, and wants you to review their progress so far. Tell me what could go wrong here.

This one’s a bit tougher than some of the early interview questions, and I think you can see that based on the responses in the comments.

Comments closed

LEN Is For Strings

Kenneth Fisher notes that the LEN function can behave oddly on non-string data types:

Which show you that the FLOAT had to be converted to VARCHAR. You can see the same thing if you try it with various versions of INT or DATE datatypes as well. Like I said earlier. No big deal with INT or even DATE. Those come back in a fairly expected format. (INTs look exactly the same and DATEs come back as ‘YYYY-MM-DD’). FLOAT and REAL however are floating point so they don’t always convert the same way. If you do the conversion deliberately you get this:

Understand your data types; otherwise, it might come back to hurt you later.

Comments closed

Plus Equals Operator

Andy Mallon shows off the += operator in T-SQL:

This is logically equivalent to the first version of the code, but I find it makes for more readable code. It just looks cleaner.

For those of us who are lazy looking to maximize efficiency, this could save a whole lot of key strokes.

This is true, but if you’re on SQL Server 2012 or later, check out CONCAT for concatenation, as it handles NULL values more elegantly.

Comments closed

Calculating Weekdays Between Dates

Daniel Hutmacher shows how to calculate the number of weekdays between two dates:

It has to be said, if performance is important (and it should be), you’ll know that user-defined scalar T-SQL functions can kill your query. So you might want to implement the above in the form of a CROSS APPLY instead

Here’s where I advocate for two tables on every instance:  a tally table and a date table.  The date table should look like a date dimension, including every potentially-interesting piece of information about a date—including if it’s a weekday.  It might not perform quite as fast as Daniel’s solution (I’d have to test to know for sure), but it’s definitely easier.   If you can’t get a viable date table on your instances for whatever reason, Daniel’s solution does work and does not require any additional objects.

Comments closed