Press "Enter" to skip to content

Category: T-SQL

Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example.  This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes.  At the very least, it’s liable to cause confusion.

Comments closed

When Scalar Functions Go Bad

Daniel Janik head-fakes us a few times when looking at scalar user-defined function performance:

I’ve read a lot of things lately pointing to scalar functions as if they were the devil. In this blog I’m going to explore if that’s the case. Let’s have a look.

It’s true that in many situations a scalar function is often a performance bottleneck; but, is there a situation where they could be responsibly used?

What if you had a lookup table that almost never changed? Is it worth doing a join on the lookup to get the data you need?

Let’s examine a simple join between a customer address and a state lookup table.

Things are not always as they seem.

Comments closed

Using NUnit For SQL Server Integration Tests

Ben Jarvis shows us how to use NUnit to perform integration testing with SQL Server stored procedures:

I wanted a way to automate the integration testing of my repositories and stored procedures so I developed the solution described below using NUnit as the test framework and SQL Server LocalDB as the database to run my tests against.

I had the following requirements for my solution which NUnit has been able to satisfy:

  • Quick – tests should run quickly and not require massive amounts of set up / tear down

  • Independent – all tests should be independent from one another and responsible for their own set up / tear down

  • Simple – the test code should be simple to understand and easy to work with when writing new tests.

  • Work Everywhere – the tests should be able to work anywhere and not require huge dependencies like a full SQL Server instance, they should be able to work with SQL LocalDB

Read on for the solution.

Comments closed

Finding Gaps In Identity Columns

Shaun J Stuart walks us through a couple of solutions for finding gaps in identity ranges:

Have you ever had random inserts into a large table fail? Most of the time, inserts happen fine, but every so often you get a failure with a “primary key violation” error? If your primary key is an integer column with the identity property, you may be wondering how this is possible.

What is likely happening is your table has grown very large or has been in use for a long time and your identity column ran out of numbers. An integer column has a maximum value of 2,147,483,647. Now an integer can start at -2,147,483,648, but most people  start at 0 or 1, so that leaves you with 2 billion numbers.

This is a specific sub-case of the more general gaps and islands problem.

Comments closed

Finding The First Non-NULL Value In A Window

Bert Wagner shows off the FIRST_VALUE window function and walks us through a case it struggles with:

The SQL Server FIRST_VALUE function makes it easy to return the “first value in an ordered set of values.”

The problem is that if that first value happens to be a NULL, there is no easy, built-in way to skip it.

While a UserVoice item exists to add the ability to ignore nulls (go vote!), today, we’re going accomplish that end result with some alternative queries.

Click through for the demo, as well as a video version of the post.

Comments closed

GOTO And Labels In T-SQL

Ryan Desmond demonstrates the purpose of GOTO in T-SQL:

So I was playing around at work today and decided for whatever reason to see how I could get the code I was writing to fire off only in certain situations.

If it’s Sunday maybe, or if this is in a particular environment, or if a record in an admin table was something specific.  I’m not sure how I’ll use this but I stumbled on Labels and decided to play with them.

Ok, so how to get to know labels.  Well, in order to get them to work sometimes I have to create labels that are based on some criteria.

I do try to avoid these as much as possible, but they are valid syntax and I’ve seen a couple of cases where it makes sense to use GOTO.

1 Comment

Window Functions Have Defaults, Too

Steve Jones reminds us that when running a window function, there is a default window in place:

What I want to do is compare the passing yards each year with the most current value for that player, showing the plus or minus. This means that for Aaron Rodgers, who threw for 1675 yards in 2017, I’d want to show this for the first few years of his career:

This shows me an easy view of the years where he was better in his career than he is now. Last year was likely a down year because of injury, but we’ll see this year.

In any case, if I run this query using LAST_VALUE() for the final year of his career, I don’t get the right results.

It’s good to keep in mind the full syntax for a window function for just this reason.

Comments closed

Shuffling Data And Zipping Results In T-SQL

Phil Factor continues his series on pseudonymization:

The problems come with uncommon values. If you are pseudonymizing a medical database that is required for research purposes on people with potentially embarrassing diseases, and it appears on the dark web, anyone with a rare or unusual surname or first-name comes up on the list, so the shuffle doesn’t help the privacy of Fortescue Ceresole, or whatever his name may be.

If you are spoofing data entirely, you don’t necessarily have this problem because your constructed value will have no relationship to the original value. If it comes from a list of common names or if you randomly create a name ‘Thomas’, it will have no relationship to the original names in the database as long as you did things correctly and shuffle the list. Although a Markov string can produce an identical name that is uncommon, it can be eliminated from the list by an outer join with the original data.

After you shuffle data, you ‘zip’ it. Zipping lists is something you come across in procedural programming, and Linq has a good example. A .net array has an order, and all you are doing is to join by the order of the element in the list. If you randomize that order, you get a shuffle.

Read on for an example using the AdventureWorks Person.Person table.

Comments closed

Last Observation Carried Forward In T-SQL

Pawan Khowal shows one example of implementing Last Observation Carried Forward in T-SQL:

A very close friend given this to me. In this puzzle you have to fill the price of SKU & Color Id for missing months. Note that SKU & Color Id should be considered as a business unit. So you have to set the previous value available to the missing month. Please check out the sample input and the expected output. In this solution I have not considered any performance considerations.

Included is one solution, though there are others.

Comments closed