Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Kevin Feasel

2018-09-13

Bugs, T-SQL

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example.  This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes.  At the very least, it’s liable to cause confusion.

Related Posts

T-SQL Bugs with Joins

Itzik Ben-Gan takes us through four bugs or oddities around joins: The order counts are now correct, but the total freight values are not. Can you spot the new bug? The new bug is more elusive because it manifests itself only when the same customer has at least one case where multiple orders happen to […]

Read More

Default Schemas in SQL Server

Max Vernon takes us through the order in which SQL Server searches for tables given a single-part name: Default schemas in SQL Server can be a blessing, since they reduce the need to specify the schema when creating DDL statements in T-SQL. However, relying on the default schema when creating DML statements can be problematic. […]

Read More

Categories

September 2018
MTWTFSS
« Aug Oct »
 12
3456789
10111213141516
17181920212223
24252627282930