Press "Enter" to skip to content

Category: T-SQL

Generating Change Scripts

Steve Jones shows off a rather valuable feature in SQL Server Management Studio:

One of the really basic things I think everyone should understand is how to get scripts from Management Studio (SSMS) and saving them. In fact, I’ve written that everyone should use this button and really not ever execute their GUI changes. Capture the script, save that, and automate things.

However, that’s not what this post is about. This post is about how you get a script to look at changes, or better understand how SSMS might implement your changes.

The ability to script out your changes has a number of benefits, one of which is that you’ll get to learn the code you need to write to perform an action, which could make all the difference in a production-down situation.

Comments closed

Common Table Expressions Aren’t Tables

Grant Fritchey shows that CTEs are not tables; they’re expressions:

The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.

Read the whole thing.

Comments closed

CTEs And Semi-Colons

Jason Brimhall discusses common table expressions:

Wait! Hold on two seconds there! Surely the semi-colon is an absolute requirement because we see it everywhere that it is a mandatory requirement.

The reality is that the semi-colon requirement is not really entirely accurate. If the CTE happens to be in the same batch, then the previous statement in the batch must be terminated by the semi-colon.

This post went down an unexpected path, and ended up being rather interesting.  Read the whole thing.

Comments closed

Comparing Nullable Columns

Daniel Hutmacher shows an elegant way to compare multiple nullable columns on two tables:

Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.

Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.

This is an interesting use of INTERSECT.  Check it out.

Comments closed

Handling Delimited Strings

Ed Pollack looks at several methods of creating delimited strings, and then several methods of breaking out delimited strings:

The execution plan is cut off, but you can be assured that there are six more similar plans below the ones pictured here. These metrics are misleading as each loop doesn’t seem too bad, right? Just 9% of the subtree cost or a few hundred reads doesn’t seem too wild, but add up all of these costs and it becomes clear that this won’t scale. What if we had thousands of rows to iterate through? For 5,000 rows, we would be looking at about 147,995,000 reads! Not to mention a very, very long execution plan that is certain to make Management Studio crawl as it renders five thousand execution plans.

Alternatively, we could cache all of the data in a temp table first, and then pull it row-by-row. This would result in significantly less reads on the underlying sales data, outperforming cursors by a mile, but would still involve iterating through the temp table over and over. For the scenario of 5,000 rows, we’d still have an inefficient slog through a smaller data set, rather than crawling through lots of data. Regardless of method, it’s still navigating quicksand either way, with varying amounts of quicksand.

This is a good article and includes STRING_SPLIT and OPENJSON methods in SQL Server 2016, but does not include building your own CLR methods.  Check it out if you need to do any delimited string manipulation.

Comments closed

Unpivoting With APPLY

I have a post on using the APPLY operator to unpivot data:

This code works, but if you have dozens of years, it gets messy writing those case statements and you’re a bit more likely to make a mistake when refactoring code. Here’s a simpler version using CROSS APPLY:

1
2
3
4
5
6
7
8
9
10
11
SELECT
    s.Product,
    y.[Year],
    y.Quantity,
    y.[Value]
FROM #Sales s
    CROSS APPLY (VALUES
        (2013, [Qty2013], [Val2013]),
        (2014, [Qty2014], [Val2014]),
        (2015, [Qty2015], [Val2015])
    ) y([Year], Quantity, [Value]);

It’s a little easier to read than the other version, and adding additional years is pretty straightforward.  That makes for a great tip when you’re trying to refactor poorly-thought-out tables or bring into your system potentially well-thought-out flat files.

APPLY is an elegant solution to so many different classes of problem.

Comments closed

Finding Dates

Derik Hammer shows the right way and the wrong way of using date functions in a WHERE clause:

I then changed my thought process to find the age of a 65 year old who’s birth day is today. Then I compare the DateOfBirth column to that static value.

SELECT COUNT(*) FROM dbo.Person p WHERE DateOfBirth <= DATEADD(YEAR,-65,GETDATE())

With the above query I bought myself an index seek and 345 logical reads. That works out to <3% of the cost.

This is true not just for date functions, but rather is applicable to almost all scalar functions.

Comments closed

CASE Statements In GROUP BY Clauses

Grant Fritchey looks at CASE statements within GROUP BY clauses:

The same basic set of structures, scans against both tables, to arrive at the data. Cost estimates between the two plans are very different though, with the targeted queries having a much lower estimated cost.

Performance-wise, interestingly enough, the average execution time of the first query, only returning the 10 rows, is 157ms on average, while the query grouping directly on the SalesPersonID averages about 190ms. Now, the reads tell a slightly different story with 17428 on the generic query and 5721 on the specific query. So, maybe a server under load will see a significant performance increase. However, let’s deal with what we have in front of us and say that, at least for these tests, the catch-all GROUP BY query performs well.

Grant’s recommendation is to split this out into several procedures, and if you’re having performance problems, that’s a solid move.  I’m a bit more likely to keep them (especially in warehousing reports), but it’s nice to have options.

Comments closed

LAST_VALUE

Steve Jones plays with a window function new to SQL Server 2012:

The important thing to understand with window functions is that there is a frame at any point in time when the data is being scanned or processed. I’m not sure what the best term to use is.

Let’s look at the same data set Kathi used. For simplicity, I’ll use a few images of her dataset, but I’ll examine the SalesOrderID. I think that can be easier than looking at the amounts.

Here’s the base dataset for two customers, separated by CustomerID and ordered by the OrderDate. I’ve included amount, but it’s really not important.

Steve goes into detail and explains what’s going on each step of the way.  Window functions are extremely useful; check them out if you’re not already familiar with them.

Comments closed

How Do You Respond?

Erik Darling has a new interview question:

A new developer has been troubleshooting a sometimes-slow stored procedure, and wants you to review their progress so far. Tell me what could go wrong here.

This one’s a bit tougher than some of the early interview questions, and I think you can see that based on the responses in the comments.

Comments closed