Press "Enter" to skip to content

Category: T-SQL

Human-Readable Ranges

Daniel Hutmacher shows us how to build human-readable ranges of integers and dates:

This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.

So here’s a powerful pattern to solve that task.

I really like this.  It takes the gaps & islands problem and goes one step further.

Comments closed

Differences In Type

Grant Fritchey explains the differences between table variables, temporary tables, and common table expressions:

Don’t go getting all excited. I recognize that these two plans look similar, but they are different. First, let me point out that we have more reads with 1546 and an increase in duration to 273ms. This comes from two places. First, we’re creating statistics on the data in the temporary table where none exist on the table variable. Second, because I want to run this script over and over, I’m including the DROP TABLE statement, which is adding overhead that I wouldn’t see if I treated it like the table variable (which I could, but not here). However, breaking down the to the statement level, I get 250ms duration, just like with the table variable, but, I see 924 reads.

What’s going on?

There certainly are differences between the three.  Read on for more details.

Comments closed

Range And Variance

Mala Mahadevan looks at calculating range, variance, and standard deviation in R and T-SQL:

The first and most common measure of dispersion is called ‘Range‘. The range is just the difference between the maximum and minimum values in the dataset. It tells you how much gap there is between the two and therefore how wide the dataset is in terms of its values. It is however, quite misleading when you have outliers in the data. If you have one value that is very large or very small that can skew the Range and does not really mean you have values spanning the minimum to the maximum.

To lower this kind of an issue with outliers – a second variation of the range called Inter-Quartile Range, or IQR is used. The IQR is calculated by dividing the dataset into 4 equal parts after sorting the said value in ascending order. For the first and third part, the maximum values are taken and then subtracted from each other. The IQR ensures that you are looking at top and near-bottom ranges and therefore the value it gives is probably spanning the range.

Just like her previous post, this one also includes an example built for SQL Server R Services.

Comments closed

Simplified Order Of Operations

Michael J. Swart looks at how SQL Server implements order of operations:

I have a book on my shelf called Practical C Programming published by O’Reilly (the cow book) by Steve Oualline. I still love it today because although I don’t code in C any longer, the book remains a great example of good technical writing.

That book has some relevance to SQL today. Instead of memorizing the full list of operators and their precedence, Steve gives a practical subset:

    1. * (Multiply), / (Division)
    2. + (Add), – (Subtract)

Put parentheses around everything else.

Parentheses, even when unnecessary, are usually a good idea.  They help the reader understand what was going through your mind at the time.

Comments closed

Generating Change Scripts

Steve Jones shows off a rather valuable feature in SQL Server Management Studio:

One of the really basic things I think everyone should understand is how to get scripts from Management Studio (SSMS) and saving them. In fact, I’ve written that everyone should use this button and really not ever execute their GUI changes. Capture the script, save that, and automate things.

However, that’s not what this post is about. This post is about how you get a script to look at changes, or better understand how SSMS might implement your changes.

The ability to script out your changes has a number of benefits, one of which is that you’ll get to learn the code you need to write to perform an action, which could make all the difference in a production-down situation.

Comments closed

Common Table Expressions Aren’t Tables

Grant Fritchey shows that CTEs are not tables; they’re expressions:

The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.

Read the whole thing.

Comments closed

CTEs And Semi-Colons

Jason Brimhall discusses common table expressions:

Wait! Hold on two seconds there! Surely the semi-colon is an absolute requirement because we see it everywhere that it is a mandatory requirement.

The reality is that the semi-colon requirement is not really entirely accurate. If the CTE happens to be in the same batch, then the previous statement in the batch must be terminated by the semi-colon.

This post went down an unexpected path, and ended up being rather interesting.  Read the whole thing.

Comments closed

Comparing Nullable Columns

Daniel Hutmacher shows an elegant way to compare multiple nullable columns on two tables:

Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.

Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.

This is an interesting use of INTERSECT.  Check it out.

Comments closed

Handling Delimited Strings

Ed Pollack looks at several methods of creating delimited strings, and then several methods of breaking out delimited strings:

The execution plan is cut off, but you can be assured that there are six more similar plans below the ones pictured here. These metrics are misleading as each loop doesn’t seem too bad, right? Just 9% of the subtree cost or a few hundred reads doesn’t seem too wild, but add up all of these costs and it becomes clear that this won’t scale. What if we had thousands of rows to iterate through? For 5,000 rows, we would be looking at about 147,995,000 reads! Not to mention a very, very long execution plan that is certain to make Management Studio crawl as it renders five thousand execution plans.

Alternatively, we could cache all of the data in a temp table first, and then pull it row-by-row. This would result in significantly less reads on the underlying sales data, outperforming cursors by a mile, but would still involve iterating through the temp table over and over. For the scenario of 5,000 rows, we’d still have an inefficient slog through a smaller data set, rather than crawling through lots of data. Regardless of method, it’s still navigating quicksand either way, with varying amounts of quicksand.

This is a good article and includes STRING_SPLIT and OPENJSON methods in SQL Server 2016, but does not include building your own CLR methods.  Check it out if you need to do any delimited string manipulation.

Comments closed

Unpivoting With APPLY

I have a post on using the APPLY operator to unpivot data:

This code works, but if you have dozens of years, it gets messy writing those case statements and you’re a bit more likely to make a mistake when refactoring code. Here’s a simpler version using CROSS APPLY:

1
2
3
4
5
6
7
8
9
10
11
SELECT
    s.Product,
    y.[Year],
    y.Quantity,
    y.[Value]
FROM #Sales s
    CROSS APPLY (VALUES
        (2013, [Qty2013], [Val2013]),
        (2014, [Qty2014], [Val2014]),
        (2015, [Qty2015], [Val2015])
    ) y([Year], Quantity, [Value]);

It’s a little easier to read than the other version, and adding additional years is pretty straightforward.  That makes for a great tip when you’re trying to refactor poorly-thought-out tables or bring into your system potentially well-thought-out flat files.

APPLY is an elegant solution to so many different classes of problem.

Comments closed