Thoughts On UTF-8 Encoding In SQL Server 2019

Solomon Rutzky digs into UTF-8 support in SQL Server 2019 and has found a few bugs:

Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:

  1. can be used …

    1. as a database-level default Collation
    2. as a column-level Collation
    3. by appending “_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “_SC” in their name, or being of level 140 or newer)
    4. with only the CHAR and VARCHAR
  2. (implied) have no effect on NCHAR and NVARCHAR data (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents

  3. “This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)

Solomon takes his normal, thorough approach to the problem and finds several issues.

Related Posts

Parsing Numeric Values From Multiple Cultures

Bert Wagner shows us a good way of converting strings to numbers when multiple cultures are in play: Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods? Great question!  Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database […]

Read More

Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Kevin Feasel

2018-09-13

Bugs, T-SQL

Randolph West shares an example of an issue with BULK INSERT: Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well. Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be […]

Read More

Categories

October 2018
MTWTFSS
« Sep  
1234567
891011121314
15161718192021
22232425262728
293031