Solomon Rutzky digs into UTF-8 support in SQL Server 2019 and has found a few bugs:
Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:
-
can be used …
- as a database-level default Collation
- as a column-level Collation
- by appending “
_UTF8
” to the end of any Supplementary Character-Aware Collation (i.e. either having “_SC
” in their name, or being of level140
or newer) - with only the
CHAR
andVARCHAR
-
(implied) have no effect on
NCHAR
andNVARCHAR
data (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents -
“This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)
Solomon takes his normal, thorough approach to the problem and finds several issues.