Thoughts On UTF-8 Encoding In SQL Server 2019

Solomon Rutzky digs into UTF-8 support in SQL Server 2019 and has found a few bugs:

Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:

  1. can be used …

    1. as a database-level default Collation
    2. as a column-level Collation
    3. by appending “_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “_SC” in their name, or being of level 140 or newer)
    4. with only the CHAR and VARCHAR
  2. (implied) have no effect on NCHAR and NVARCHAR data (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents

  3. “This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)

Solomon takes his normal, thorough approach to the problem and finds several issues.

Related Posts

Unexpected Results with ANY Aggregate

Paul White points out a couple odd scenarios with the ANY aggregate in SQL Server: The execution plan erroneously computes separate ANY aggregates for the c2 and c3 columns, ignoring nulls. Each aggregate independently returns the first non-null value it encounters, giving a result where the values for c2 and c3 come from different source rows. This is not what the original SQL query specification requested. The same wrong result can […]

Read More

T-SQL Bugs with Joins

Kevin Feasel

2019-06-17

Bugs, T-SQL

Itzik Ben-Gan takes us through four bugs or oddities around joins: The order counts are now correct, but the total freight values are not. Can you spot the new bug? The new bug is more elusive because it manifests itself only when the same customer has at least one case where multiple orders happen to […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031