Press "Enter" to skip to content

Category: T-SQL

Joins Versus NOT IN Clause

Kevin Hill explains a potential performance difference between using NOT IN and using a left join:

Basic stuff, right?   Both will return 951 records (books) that I do not own.  And, very quickly…because the tables are tiny.   Sub-1 second is fast.

The issue here is HOW the rows are compared.

English version now, techy stuff later:

In the first query, this is equivalent to you standing at the bookstore and calling home to have someone check to see if the book in your hand is already in your collection.  EVERY time.  One by one.

In the second, you got really smart and brought a list with you, which you are comparing to the books on the shelf at the store.   You’ve got both “lists” in one place, so it is far more efficient.

Even in the case with a few hundred records, you can see why there’d be a performance difference.

Comments closed

Installing R Packages In SQL Server

Tomaz Kastrun shows how to install packages in SQL Server R Services:

Julie Koesmarno made a great post on installing R packages. Please follow this post. Also Microsoft suggests the following way to install R packages on MSDN.

Since I wanted to be able to have packages installed directly from SQL Server Management Studio (SSMS) here is yet another way to do it. I have used xp_cmdshell to install any additional package for my R (optionally you can setEXECUTE AS USER).

This is a bit of a backdoor method, but it does work.

Comments closed

Human-Readable Ranges

Daniel Hutmacher shows us how to build human-readable ranges of integers and dates:

This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.

So here’s a powerful pattern to solve that task.

I really like this.  It takes the gaps & islands problem and goes one step further.

Comments closed

Differences In Type

Grant Fritchey explains the differences between table variables, temporary tables, and common table expressions:

Don’t go getting all excited. I recognize that these two plans look similar, but they are different. First, let me point out that we have more reads with 1546 and an increase in duration to 273ms. This comes from two places. First, we’re creating statistics on the data in the temporary table where none exist on the table variable. Second, because I want to run this script over and over, I’m including the DROP TABLE statement, which is adding overhead that I wouldn’t see if I treated it like the table variable (which I could, but not here). However, breaking down the to the statement level, I get 250ms duration, just like with the table variable, but, I see 924 reads.

What’s going on?

There certainly are differences between the three.  Read on for more details.

Comments closed

Range And Variance

Mala Mahadevan looks at calculating range, variance, and standard deviation in R and T-SQL:

The first and most common measure of dispersion is called ‘Range‘. The range is just the difference between the maximum and minimum values in the dataset. It tells you how much gap there is between the two and therefore how wide the dataset is in terms of its values. It is however, quite misleading when you have outliers in the data. If you have one value that is very large or very small that can skew the Range and does not really mean you have values spanning the minimum to the maximum.

To lower this kind of an issue with outliers – a second variation of the range called Inter-Quartile Range, or IQR is used. The IQR is calculated by dividing the dataset into 4 equal parts after sorting the said value in ascending order. For the first and third part, the maximum values are taken and then subtracted from each other. The IQR ensures that you are looking at top and near-bottom ranges and therefore the value it gives is probably spanning the range.

Just like her previous post, this one also includes an example built for SQL Server R Services.

Comments closed

Simplified Order Of Operations

Michael J. Swart looks at how SQL Server implements order of operations:

I have a book on my shelf called Practical C Programming published by O’Reilly (the cow book) by Steve Oualline. I still love it today because although I don’t code in C any longer, the book remains a great example of good technical writing.

That book has some relevance to SQL today. Instead of memorizing the full list of operators and their precedence, Steve gives a practical subset:

    1. * (Multiply), / (Division)
    2. + (Add), – (Subtract)

Put parentheses around everything else.

Parentheses, even when unnecessary, are usually a good idea.  They help the reader understand what was going through your mind at the time.

Comments closed

Generating Change Scripts

Steve Jones shows off a rather valuable feature in SQL Server Management Studio:

One of the really basic things I think everyone should understand is how to get scripts from Management Studio (SSMS) and saving them. In fact, I’ve written that everyone should use this button and really not ever execute their GUI changes. Capture the script, save that, and automate things.

However, that’s not what this post is about. This post is about how you get a script to look at changes, or better understand how SSMS might implement your changes.

The ability to script out your changes has a number of benefits, one of which is that you’ll get to learn the code you need to write to perform an action, which could make all the difference in a production-down situation.

Comments closed

Common Table Expressions Aren’t Tables

Grant Fritchey shows that CTEs are not tables; they’re expressions:

The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.

Read the whole thing.

Comments closed

CTEs And Semi-Colons

Jason Brimhall discusses common table expressions:

Wait! Hold on two seconds there! Surely the semi-colon is an absolute requirement because we see it everywhere that it is a mandatory requirement.

The reality is that the semi-colon requirement is not really entirely accurate. If the CTE happens to be in the same batch, then the previous statement in the batch must be terminated by the semi-colon.

This post went down an unexpected path, and ended up being rather interesting.  Read the whole thing.

Comments closed

Comparing Nullable Columns

Daniel Hutmacher shows an elegant way to compare multiple nullable columns on two tables:

Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.

Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.

This is an interesting use of INTERSECT.  Check it out.

Comments closed