Press "Enter" to skip to content

Supplementary Characters In SQL Server 2017

Solomon Rutzky tests SQL Server 2017 collations to determine whether they support supplementary characters:

As you scroll through the list, does anything appear to be missing? Go ahead, take another look. I’ll wait :wink:. Figure it out? Yep, that’s right: none of those Collations end in “_SC“. Collations ending in “_SC” were added in SQL Server 2012 and support the full UTF-16 character set. Without the “_SC” ending built-in functions and comparisons / sorting only fully support the base UCS-2 character set (i.e. the first 65,536 Unicode Code Points). You can, of course, store and view all Unicode Code Points, even Supplementary Characters, in non-“_SC” Collations, but they will be interpreted as being two separate “unknown” characters instead of as a single character.

To illustrate this point, the following query shows: the two surrogate Code Points (not actual characters by themselves, but when used in pairs of any of the assigned combinations, produce a single character), the resulting Supplementary Character, and how the LEN function interprets that sequence in both non-“_SC” and “_SC” Collations.

Read on for Solomon’s testing, which includes a cat face emoji.  I know that I, for one, thought there was insufficient cat face emoji representation in SQL Server prior to 2017.