Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Kevin Feasel

2018-09-13

Bugs, T-SQL

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example.  This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes.  At the very least, it’s liable to cause confusion.

Related Posts

Using Calendar Tables

I have a post up on using calendar tables: There’s one problem with picking a SQL Saturday in April: Easter and Passover tend to run right around that time, and nobody wants a SQL Saturday on Passover or the day before Easter. Unfortunately, our calendar table doesn’t include holiday information. So let’s add it! Working […]

Read More

A Rant About ORMs

Ned Otter is not a fan of ORMs: I’ve seen a lot of tech come and go in my time, but nothing I’ve seen vexes me more than “framework generated SQL”.  No doubt I’m ignorant about some aspects of it, but its usage continues to confound many a DBA. To troubleshoot one of these bad […]

Read More

Categories

September 2018
MTWTFSS
« Aug Oct »
 12
3456789
10111213141516
17181920212223
24252627282930