Press "Enter" to skip to content

Category: T-SQL

Generating Absurd Numbers Of Columns

Brent Ozar wants to generate SmallInt.Max columns:

Alright, so we’ve learned that I can’t return more than 65,535 columns, AND I can only use 4,096 elements in my SELECT. I can think of several workarounds there – 65,535 / 4096 = about 16, which means I could create a few tables or CTEs and do SELECT *’s from them, thereby returning all 65,535 columns with less than 4,096 things in my SELECT. But for now, we’ll just start with 4,096 things in my SELECT:

If you think you need 65K columns returned, I refer you to Swart’s Ten Percent Rule.

Comments closed

Filtering Data

Slava Murygin shows various ways to filter data, in particular data from fn_dblog():

Would say we need to extract an information associated with an “UPDATE” for LSNs started at “0000004f:00000087:0001”. You can just specify Starting and Ending LSNs as “fn_dblog” parameters:

[…]

That portion of code would return you ONLY Log records between LSNs “0000004f:00000087:0001″ and “0000004f:00000088:0001″.

Slava’s post uses fn_dblog() as an example but the techniques are applicable across the board, and in practice sum up to “get the fewest number of rows and fewest number of columns you need to solve the problem at hand.”

Comments closed

Joins Versus NOT IN Clause

Kevin Hill explains a potential performance difference between using NOT IN and using a left join:

Basic stuff, right?   Both will return 951 records (books) that I do not own.  And, very quickly…because the tables are tiny.   Sub-1 second is fast.

The issue here is HOW the rows are compared.

English version now, techy stuff later:

In the first query, this is equivalent to you standing at the bookstore and calling home to have someone check to see if the book in your hand is already in your collection.  EVERY time.  One by one.

In the second, you got really smart and brought a list with you, which you are comparing to the books on the shelf at the store.   You’ve got both “lists” in one place, so it is far more efficient.

Even in the case with a few hundred records, you can see why there’d be a performance difference.

Comments closed

Installing R Packages In SQL Server

Tomaz Kastrun shows how to install packages in SQL Server R Services:

Julie Koesmarno made a great post on installing R packages. Please follow this post. Also Microsoft suggests the following way to install R packages on MSDN.

Since I wanted to be able to have packages installed directly from SQL Server Management Studio (SSMS) here is yet another way to do it. I have used xp_cmdshell to install any additional package for my R (optionally you can setEXECUTE AS USER).

This is a bit of a backdoor method, but it does work.

Comments closed

Human-Readable Ranges

Daniel Hutmacher shows us how to build human-readable ranges of integers and dates:

This is a real-world problem that I came across the other day. In a reporting scenario, I wanted to output a number of values in an easy, human-readable way for a report. But just making a long, comma-separated string of numbers doesn’t really make it very readable. This is particularly true when there are hundreds of values.

So here’s a powerful pattern to solve that task.

I really like this.  It takes the gaps & islands problem and goes one step further.

Comments closed

Differences In Type

Grant Fritchey explains the differences between table variables, temporary tables, and common table expressions:

Don’t go getting all excited. I recognize that these two plans look similar, but they are different. First, let me point out that we have more reads with 1546 and an increase in duration to 273ms. This comes from two places. First, we’re creating statistics on the data in the temporary table where none exist on the table variable. Second, because I want to run this script over and over, I’m including the DROP TABLE statement, which is adding overhead that I wouldn’t see if I treated it like the table variable (which I could, but not here). However, breaking down the to the statement level, I get 250ms duration, just like with the table variable, but, I see 924 reads.

What’s going on?

There certainly are differences between the three.  Read on for more details.

Comments closed

Range And Variance

Mala Mahadevan looks at calculating range, variance, and standard deviation in R and T-SQL:

The first and most common measure of dispersion is called ‘Range‘. The range is just the difference between the maximum and minimum values in the dataset. It tells you how much gap there is between the two and therefore how wide the dataset is in terms of its values. It is however, quite misleading when you have outliers in the data. If you have one value that is very large or very small that can skew the Range and does not really mean you have values spanning the minimum to the maximum.

To lower this kind of an issue with outliers – a second variation of the range called Inter-Quartile Range, or IQR is used. The IQR is calculated by dividing the dataset into 4 equal parts after sorting the said value in ascending order. For the first and third part, the maximum values are taken and then subtracted from each other. The IQR ensures that you are looking at top and near-bottom ranges and therefore the value it gives is probably spanning the range.

Just like her previous post, this one also includes an example built for SQL Server R Services.

Comments closed

Simplified Order Of Operations

Michael J. Swart looks at how SQL Server implements order of operations:

I have a book on my shelf called Practical C Programming published by O’Reilly (the cow book) by Steve Oualline. I still love it today because although I don’t code in C any longer, the book remains a great example of good technical writing.

That book has some relevance to SQL today. Instead of memorizing the full list of operators and their precedence, Steve gives a practical subset:

    1. * (Multiply), / (Division)
    2. + (Add), – (Subtract)

Put parentheses around everything else.

Parentheses, even when unnecessary, are usually a good idea.  They help the reader understand what was going through your mind at the time.

Comments closed