Switching To Identities

Kevin Feasel



James Anderson shows how to do a table switch to switch a table without an identity column to one with an identity column:

The SWITCH statement can instantly ‘move’ data from one table to another table. It does this by updating some meta data, to say that the new table is now the owner of the data instead of the old table. This is very useful as there is no physical data movement to cause the stresses mentioned earlier. There are a lot of rules enforced by SQL Server before it will allow this to work. Essentially each table must have the same columns with the same data types and NULL settings, they need to be in the same file group and  the new table must be empty. See here for a more detailed look at these rules.

If you can take a downtime, this is pretty easy.  Otherwise, making sure that the two tables are in sync until the switchover occurs is a key problem to keep in mind.

T-SQL Medians

Kevin Feasel



Daniel Hutmacher has a post showing how to calculate medians and percentiles in T-SQL:

Medians as a concept are simple enough. If you have a large number of values, like a range of statistical values, you want to pick the middle one. The median, as opposed to the average is useful for a number of reasons, one of them that you can reduce the effect of so-called outlier values.

The fact that SQL Server doesn’t have a fast, built-in median function surprises me, to be honest.  The best alternative I’ve found was a CLR function in SQL#.

Altering Columns

Kenneth Fisher points out that there are defaults when altering columns:

So here is the thing. When you change one you change them all. That means if you don’t specify a precision when you can then you get the default. That’s not exactly a common problem though. Usually what you are changing is the precision (or possibly the datatype). What is a common mistake is not specifying the nullability.

When modifying DDL, make sure that you keep it consistent and complete.

Luhn Testing In T-SQL

Kevin Feasel



Phil Factor shows us the Luhn algorithm, a quick test to determine if a credit card number is potentially valid:

There are many ways of doing it in SQL. (and Rosetta Code is a good place to view solutions in various other languages). I believe that Peter Larsson holds the record for the fastest calculation of the Luhn test for a sixteen-digit credit card, with this code. As it stands, it isn’t a general solution, but it can be modified for different lengths of bank card.

Phil has two interesting T-SQL functions in the code and wants to find more.

Unit Testing A Function

Steve Jones walks through a practical example of unit testing T-SQL with tsqlt:

However I wanted to add some tests. Does this really work? What if I don’t have a backslash? I thought the best way to do this was with a few tSQLt tests, which I quickly built. The entire process was 5-10 minutes, which isn’t a lot longer than if I had been running random tests myself with a variety of strings.

The advantage of tests is that if I come up with a new case, or another potential bug, I copy the test over, change the string and I have a new test, plus all the regressions. I’m not depending on my memory to run the test cases.

I first put the code in a function, which makes it easier to test.

tsqlt is a great tool for database unit testing.

Aggregates Using OVER

Kevin Feasel



Slava Murygin shows aggregation and windowing using SUM:

As a conclusion: You CAN use “OVER” clause to do the aggregation in three following cases:
1. When data set is extremely small and fits in just one 8 Kb page;
2. When you want to hide your logic from any future developer or even yourself to make debugging and troubleshooting a nightmare;
3. When you really want to kill your SQL Server and its underlying disk system;

That conclusion’s rather pessimistic for my tastes, mostly because Slava’s trying to do the same thing with a window function that he’s doing with a GROUP BY clause and has multiple functions across different windows (including calculations).  Using SUM() OVER() is powerful when you still need the disaggregated values—for example, running totals.

Trusting Constraints

Dennes Torres talks about whether a constraint is trustworthy:

If the check constraint is trustable, it can be used by the query optimizer. For example, if the check constraint avoid values below 100 in a field and a query for 50 arrives, the query optimizer uses the check constraint to stop the query.

The query optimizer can only use the check constraint if it’s trustable, otherwise it could exist in the table records with values below 100, according to our example, and the query would loose these records.

Dennes then goes on to show how you can have non-trustworthy constraints and how to fix the issue.

Defaults Not Guaranteed Equal

Michael J. Swart shows that two DATETIME2 columns with default constraints will not necessarily show the same value upon insertion:

If I want to depend on these values being exactly the same, I can’t count on the default values.

Default constraints will fill in the correct value, but as Michael notes, “the correct value” is calculated each time.  Also, note that his results are about a millisecond off, so if you’re just using DATETIME, the frequency of observation of this occurrence will be lower, as DATETIME is only good to 3 milliseconds.  That’s not a good reason to use DATETIME, though.

Named Constraints

Louis Davidson on naming constraints:

It has long been a habit that I name my constraints, and even if it wasn’t useful for database comparisons, it just helps me to see the database structure all that much eaiser. The fact that I as I get more experience writing SQL and about SQL, I have grown to habitually format my code a certain way makes it all the more interesting to me that I had never come across this scenario to not name constraints.

Temp tables are special.  There’s another reason to have non-named constraints on temp tables inside stored procedures:  it allows for temp table reuse, as shown on slide 21 in this Eddie Wuerch slide deck from SQL Saturday 75 (incidentally, the first SQL Saturday I ever attended).

De-Duplicating Delimited Lists

Kevin Feasel



Phil Factor looks at de-duplicating lists:

So there you have it. With XML tricks and window functions, we have more opportunity for kicking out any need for functions. To use this code, you’d just swap out the select statement that supplied my samples to the routine, for the lists that you want to deduplicate. Sure, this sort of job will never be quick because there are still correlated subqueries in there to upset the CPU! I am intrigued that there are such different ways of doing a solution for this task in SQL server. Are there yet other ways of doing it?

Cf. Aaron Bertrand’s tally table method.  Bonus points if you’re mentally screaming “CLR!”


May 2017
« Apr