Press "Enter" to skip to content

Category: T-SQL

Dynamic Unpivoting For Change Detection

Shane O’Neill has a script that dynamically unpivots a pair of rows and compares values column by column, storing the changes in XML:

Overall, the script is longer at nearly double the lines but where it shines is when adding new columns.
To include new columns, just add them to the table; to exclude them, just add in a filter clause.

So, potentially, if every column in this table is to be tracked and we add columns all the way up to 1,024 columns, this code will not increase.
Old way: at least 6,144.
New way: at least 2,048.
Dynamic: no change

Read on for that script.  Even though his developer ended up not using his solution, Shane has made it available for the rest of the world so that some day, someone else can have the maintenance nightmare of trying to root out a bug in the process.

Comments closed

Mann-Whitney U Test in SQL

Phil Factor continues his Statistics in SQL series with the Mann-Whitney U test:

There are several ways that you can test this, but nobody is going to argue with you if you use a Mann–Whitney U test to test whether two samples come from the same distribution. It doesn’t require that the data has any particular type of distribution. It just requires that each observation is done by a different member of the population so that all the observations from both groups are independent of each other. It is really just a test of differences in mean-rank between two populations’ pooled ranking. To test this difference It has to be possible to compare any of the observations with any of the others and say which of the two are greater. Your objective is to disprove the assumption that The distributions of both populations are equal. Calculating a measure of the difference is simple, and was designed to be done easily by hand before computers. The probability that the observed difference occurred by chance is easily calculated for large samples because U then approximates to the normal distribution, but it is complex for small samples. Here, we have a small sample and are just interested in whether the two-tailed test is signifcant at the five percent level so we dodge the bullet by using a significance lookup table for the critical value of U.

Read on for Phil’s implementation of the test.

Comments closed

Batched Data Archival

Anders Pedersen has a script to archive data in smaller batches:

Seeing as the data had to be retrievable for any date, I could not simply delete the very old data.  These tables also had constant inserts and updates into them, so making sure the tables remained available became important, i.e. needed to have acceptable time that the table was being locked, with time for waiting transactions to finish.

The solution I came up with does this with variable size batches.  Now, with modern versions of SQL, there are other ways to do this, but the good thing about this method it works regardless of version of SQL, as well as edition.  Azure SQL DB would need some modification to make it work to archive to a separate database.

Click through for the script.

Comments closed

EXISTS Is Self-Contained

Shane O’Neill ponders an existential problem:

So, drinking my first (of many) coffee of the day, I asked him what was wrong with it.

I have two tables. 1 with values 1,2,3 & the other with values 1,2,3,4,5. When I use delete exists, it should just delete 1,2,3 but table1 is always empty.

Hmmm, not an unreasonable assumption I suppose so I asked him for his code.

Read on for Shane’s explanation, though he doesn’t like the verbosity.  My version is, what happens in EXISTS stays in EXISTS.  It just returns a signal to the outer query saying yea or nay and the outer query does its thing accordingly.  In this case, if you want to tie results back to the delete operation, use IN (the ANSI standard way) or JOIN (typically my preferred way, given that IN can get dicey with more complex criteria).

Comments closed

New T-SQL Functions

Dennes Torres shows off four T-SQL functions releasing in SQL Server 2017:

Translate

Translate does the work of several replace functions, simplifying some queries.

The function is called ‘Translate’ because its main objective: transform one kind of information in another by doing a bunch of replaces.

For example: GeoJson and WKT are two different formats for coordinates. In GeoJson a coordinate is represented using the format ‘[137.4, 72.3]’ while in WKT a point is represented using the format ‘(137.4 72.3)’.

We would need several ‘Replace’s to transform GeoJson format in WKT format and the reverse. The ‘Translate’ function can do this easily.

I knew the first three, but Translate sneaked right past me.

Comments closed

Ditching IsNumeric()

Phil Factor on the IsNumeric() function:

IsNumeric() is actually answering the question ‘Can this string be converted or ‘cast’ to a numeric datatype by SQL Server?’ It is pretty good at getting this answer right but it doesn’t tell you which datatype. It also isn’t interested in such things as overflow.

This problem of IsNumeric() was solved in SQL Server 2012. There was no way of morphing IsNumeric() intro something more valuable, so Try_Cast(), Try_Parse(), and Try_Convert() were introduced. The introduction of these system functions really solve the problem, unless you are still on earlier versions than SQL Server 2012. I’ll show how to do the same thing in previous versions of SQL Server, and demonstrates one or two tricks you can do with these functions.

I’ll stick with the post-2012 version, please.

Comments closed

Building Random Number Ranges

David Fowler shows how to generate a random number for each record in a result set:

Hmmmmm…. It looks like we’ve got the same number for every person, that wasn’t what we wanted.  And that’s my issue with RAND(), it’ll give you a different random number every time it runs but if run as part of a query it’ll always return the same number for every row returned.

So what else can we do?  We’ll there is something that gives us a ‘random’ value for every row in the query and that’s our good old friend NEWID().  Let’s try the same query but this time we’ll swap RAND() with NEWID().

One major use case for this is sampling data sets for model training and testing:  if you pull from a range of 1-10, you could perhaps train against 1-5, cross-validate against 6-7, and test against 8-10.  Doing this instead of TOP X% reduces the likelihood of sampling bias.

Comments closed

Calculating Relative Risk In T-SQL

Mala Mahadevan explains how to calculate relative risk using T-SQL:

In this post we will explore a common statistical term – Relative Risk, otherwise called Risk Factor. Relative Risk is a term that is important to understand when you are doing comparative studies of two groups that are different in some specific way. The most common usage of this is in drug testing – with one group that has been exposed to medication and one group that has not. Or , in comparison of two different medications with two groups with each exposed to a different one.

Read on for an example of a statistical formula calculation which might actually be easier in T-SQL than R.

Comments closed

Cochran-Mantel-Haenszel Test

Mala Mahadevan explains the Cochran-Mantel-Haenszel test, with two parts up so far.  First, her data set:

Below is the script to create the table and dataset I used. This is just test data and not copied from anywhere.

Second, an introduction to the test itself and solutions in R and T-SQL:

This test is an extension of the Chi Square test I blogged of earlier. This is applied when we have to compare two groups over several levels and comparison may involve a third variable.
Let us consider a cohort study as an example – we have two medications A and B to treat asthma. We test them on a randomly selected batch of 200 people. Half of them receive drug A and half of them receive drug B. Some of them in either half develop asthma and some have it under control. The data set I have used can be found here. The summarized results are as below.

This series is not yet complete, so stay tuned.

Comments closed

The REVERSE Function

John Morehouse explains what the REVERSE function is and why it can be useful:

Recently I had a conversation with a colleague that wasn’t aware of the REVERSE function in SQL Server.  I had sent them a code snippet and they inquired as to what the purpose of the function was.  Essentially, this function that reverses any string value.

For example, the phrase:

The brown dog jumped over the lazy fox

reversed looks like this

xof yzal eht revo depmuj god nworb ehT

Read on to see a good example of when you might use REVERSE.

Comments closed