Thoughts On UTF-8 Encoding In SQL Server 2019

Solomon Rutzky digs into UTF-8 support in SQL Server 2019 and has found a few bugs:

Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:

  1. can be used …

    1. as a database-level default Collation
    2. as a column-level Collation
    3. by appending “_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “_SC” in their name, or being of level 140 or newer)
    4. with only the CHAR and VARCHAR
  2. (implied) have no effect on NCHAR and NVARCHAR data (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents

  3. “This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)

Solomon takes his normal, thorough approach to the problem and finds several issues.

Parsing Numeric Values From Multiple Cultures

Bert Wagner shows us a good way of converting strings to numbers when multiple cultures are in play:

Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods?

Great question!  Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database would be the way to go…

Pretty for the UI, not really great for needing to do analysis on.

Admittedly, TRY_PARSE is slow, but for a one-time conversion as you try to fix a bad idea, it’s fine.

Using AT TIME ZONE In SQL Server

Randolph West looks at the AT TIME ZONE clause when working with a specific time zone in SQL Server:

The time zone name is taken from a list maintained in the following Windows registry hive: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones

Note: For SQL Server on Linux (and Docker containers), a registry shim performs the same function as the Windows registry by intercepting the API calls and returning the expected value(s).

We can also use a Transact-SQL (T-SQL) query against the system view sys.time_zone_info, which uses the information from the registry hive. This is the recommended method if you do not have access to the registry hive.

Click through for a couple of examples.

AVG And Data Types

Kendra Little explains how the AVG() function works with a couple different data types:

This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.

I was inspired to write this quiz because I’ve been teaching TSQL School each week. When we covered aggregate functions, I remembered just how tricksy and non-average that AVG() can be.

Want to take the quiz before you read the answers?

Head over here and take the quiz first.

Or read this post and then take the quiz, if you prefer. I’m not gonna call it cheating, because it’s learning either way.

Kendra explains each of the answers, so I’d recommend taking the quiz first.

Data Types In R

Ellen Talbot gives us an overview of the different data types in R:

Now here’s something we didn’t cover in the video and is especially helpful if something just WILL NOT work and you’ve spent all morning panic eating biscuits.

You can write checks to see if something is numeric, or an integer, with is.numeric() or is.integer().

The general “‘is.XXXXX()’” function will take many of the data types we cover here and more, and can be a real time/life saver.

We could also use class() here and inspect the result.^[You might recall that class(1) had the result of “numeric” – R was not by default considering 1 as an integer for the purpose of the class() function. ### Special numbers As well as i to denote imaginary numbers, there are some additional symbols you might encounter or want to use.

There’s a video as well as a full blog post.

Let’s Not Talk About Timestamp

Randolph West hits us with a misnamed SQL Server data type:

It occurred to me that we haven’t covered the TIMESTAMP data type in this series about dates and times.

TIMESTAMP is the Windows Millennium Edition of data types. It has nothing to do with date and time. It’s a row version. Microsoft asks that we stop calling it TIMESTAMP and use ROWVERSION instead.

Much like DECIMAL is a synonym of NUMERIC, so too is TIMESTAMP a synonym of ROWVERSION. Please call it a ROWVERSION and pretend that TIMESTAMP doesn’t exist. Microsoft is deeply sorry for the confusion.

As I say, dates and times are hard.  But at least this is easy:  if you don’t use it, you won’t have problems with it.

Will It Bit?

Louis Davidson wants to see what he can cast to a bit type:

There are no other textual/alpha string values that will cast to a bit value, but the numeric values that will cast to a bit are voluminous (even some that are in string format). Consider the following eight statements:

SELECT CAST(100 AS bit);
SELECT CAST(-100 AS bit);
SELECT CAST(99999999999999999999999999999999999999 AS bit);
SELECT CAST(-99999999999999999999999999999999999999 AS bit);
SELECT CAST(88.999999 AS bit);
SELECT CAST('1' AS bit);
SELECT CAST('2' AS bit);
SELECT CAST('999999' AS bit);

Danged if they didn’t all work, and all return 1.

Check out what else Louis tries to cast to a bit type.

Using DATETIMEOFFSET

Randolph West continues his date and time data type series:

DATETIMEOFFSET works the same way as the DATETIME2 data type, except that it is also time zone aware. It is formatted as 'YYYY-MM-DD HH:mm:ss[.nnnnnnn][{+|-}hh:mm]'.

Got all that? YYYY represents a four-digit year, MM is a two-digit month between 1 and 12, DD is a two-digit day between 1 and 31 depending on the month, HH represents a two-digit hour between 0 and 23, mm is the minutes between 0 and 59, while ss is the number of seconds between 0 and 59. Once again, n represents between zero and seven decimal places in a fraction of a second.

The main difference from DATETIME2 is the time zone offset at the end, which is the number of hours and minutes as an offset from UTC time.

Read on for more.  I generally don’t use this date type much, preferring to stick with DATETIME2 and saving data as UTC.

DATETIME2 In SQL Server

Randolph West continues his SQL Server date & time data types series:

SQL Server 2008 introduced new data types to handle dates and times in a more intelligent way than the DATETIME and SMALLDATETIME types that we looked at previously.

This week, we look at the DATETIME2 data type. I’m not the first person to think that this was probably not the best name for a data type, but here we are, a decade later.

DATETIME2 is, at its heart, a combination of the DATE and TIME data types we covered in previous weeks. DATE is 3 bytes long and TIME is between 3 and 5 bytes long depending on accuracy. This of course means that DATETIME2 can be anything from 6 to 8 bytes in length.

Nowadays, if you want to store a date plus time, this should be your default, not DATETIME.

The TIME Data Type

Randolph West covers the TIME data type in SQL Server:

This week, we look at the TIME data type. It is formatted as HH:mm:ss.fffffff, where HH is hours between 0 and 23, mmis minutes between 0 and 59, ss is seconds between 0 and 59, and f represents 0 or more fractional seconds, up to a maximum of seven decimal places.

With a maximum length of 5 bytes, TIME can store a value with a granularity of up to 100 nanoseconds.

I tend not to use TIME very often.  It’s useful if you need it, but I rarely find myself needing a dateless time.

Categories

October 2018
MTWTFSS
« Sep  
1234567
891011121314
15161718192021
22232425262728
293031