Thoughts On UTF-8 Encoding In SQL Server 2019

Solomon Rutzky digs into UTF-8 support in SQL Server 2019 and has found a few bugs:

Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:

  1. can be used …

    1. as a database-level default Collation
    2. as a column-level Collation
    3. by appending “_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “_SC” in their name, or being of level 140 or newer)
    4. with only the CHAR and VARCHAR
  2. (implied) have no effect on NCHAR and NVARCHAR data (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents

  3. “This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)

Solomon takes his normal, thorough approach to the problem and finds several issues.

Related Posts

VARCHAR Size And Memory Grant Estimates

Arthur Daniels shows us a good reason for using better data sizes than just VARCHAR(MAX) everywhere: That’s a lot of desired memory, 1,493,120 KB aka 1.4 GB, but there was only 25 MB used in the sort. So why was SQL Server so far off the right estimate? It’s the data types we picked. That’s a […]

Read More

SQL Server R Services 2016 Bug

Gonzalo Bissio takes us through a bug in SQL Server R Services 2016 (which has already been fixed in 2017): 2 Weeks ago I was working in a very interesting case in SQL 2016. I received an email from one of my customers saying that they were having intermitent issues within their app that was […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031