Supplementary Characters In SQL Server 2017

Solomon Rutzky tests SQL Server 2017 collations to determine whether they support supplementary characters:

As you scroll through the list, does anything appear to be missing? Go ahead, take another look. I’ll wait :wink:. Figure it out? Yep, that’s right: none of those Collations end in “_SC“. Collations ending in “_SC” were added in SQL Server 2012 and support the full UTF-16 character set. Without the “_SC” ending built-in functions and comparisons / sorting only fully support the base UCS-2 character set (i.e. the first 65,536 Unicode Code Points). You can, of course, store and view all Unicode Code Points, even Supplementary Characters, in non-“_SC” Collations, but they will be interpreted as being two separate “unknown” characters instead of as a single character.

To illustrate this point, the following query shows: the two surrogate Code Points (not actual characters by themselves, but when used in pairs of any of the assigned combinations, produce a single character), the resulting Supplementary Character, and how the LEN function interprets that sequence in both non-“_SC” and “_SC” Collations.

Read on for Solomon’s testing, which includes a cat face emoji.  I know that I, for one, thought there was insufficient cat face emoji representation in SQL Server prior to 2017.

Related Posts

The Hidden Performance Costs Of Collation Mismatch

Nate Johnson explains why you want collation consistency when joining tables together on varchar/nvarchar columns: There’s a subtle difference here, vs. those many community blog posts, which I’ll repeat.  The columns are of the same type.  Just different collations. And when the collation on the join predicates is different, bad things happen. Let’s take CustomerNumber for example. On the ERP […]

Read More

More Collation Comparison Fun

Solomon Rutzky wraps up his collation comparison series: Just in case you were wondering: No, the test directly above test does not prove the documentation (as quoted at the top of Part A) correct. Yes, the documentation did state that characters would be converted to the Code Page specified by the Collation of the Database or […]

Read More

Categories

October 2017
MTWTFSS
« Sep Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031