Finding The Real Character Set: Unicode And SQL Server Identifiers

Kevin Feasel



Solomon Rutzky wraps up his series on Unicode and regular identifiers:

The question that I’m trying to answer is: what are the valid “letters” and “decimal numbers” from other national scripts?

I tried using the online research tool “UnicodeSet”, but that gave slightly different results compared (using the “alphabetic” and “numeric_type = decimal” properties) to what I discovered SQL Server actually accepts.

I then loaded the actual Unicode 3.2 data files only to find that the number of characters having either the “alphabetic” or “numeric_type = decimal” properties was different than both the online search and what SQL Server actually accepts.

And so…..

Click through to find the real Unicode killer.

Related Posts

A Forensic Accounting Case Study

I have a new series I’ve started on applying forensic accounting techniques as a data platform specialist: Before I dig into my case study, I want to make it absolutely clear that these techniques will help you do a lot more than uncover fraud in your environment. My hope is that there is no fraud […]

Read More

L-Diversity versus K-Anonymity

Duncan Greaves explains the concepts behind l-diversity: There are problems with K-anonymous datasets, namely the homogeneous pattern attack, and the background knowledge attack, details of which are in my original post. A slightly different approach to anonymising public datasets comes in the form of ℓ -diversity, a way of introducing further entropy/diversity into a dataset. […]

Read More


April 2018
« Mar May »