Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example.  This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes.  At the very least, it’s liable to cause confusion.

Related Posts

Dealing With Large JSON Values

Bert Wagner investigates an issue he found where his long JSON strings were becoming NULL in SQL Server: After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters.   Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently. […]

Read More

Deleting Top Records With An Order By Clause

Kenneth Fisher shows that deleting the top N records with an ORDER BY clause is not straightforward: Did you know you can’t do this? DELETE TOP (10) FROM SalesOrderDetail ORDER BY SalesOrderID DESC; Msg 156, Level 15, State 1, Line 8 Incorrect syntax near the keyword ‘ORDER’. I didn’t. Until I tried it anyway. Turns […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

September 2018
MTWTFSS
« Aug  
 12
3456789
10111213141516
17181920212223
24252627282930