Let’s start with what we are told about this new feature. According to the documentation, the new UTF-8 Collations:
can be used …
- as a database-level default Collation
- as a column-level Collation
- by appending “
_UTF8” to the end of any Supplementary Character-Aware Collation (i.e. either having “
_SC” in their name, or being of level
- with only the
(implied) have no effect on
NVARCHARdata (meaning: for these types, the UTF-8 Collations behave the same as their non-UTF-8 equivalents
“This feature may provide significant storage savings, depending on the character set in use.” (emphasis mine)
Solomon takes his normal, thorough approach to the problem and finds several issues.
Why are the salaries stored as nvarchar and formatted with commas, spaces, and periods?
Great question! Someone wanted to make sure these amounts would look good in the UI so storing the formatted values in the database would be the way to go…
Pretty for the UI, not really great for needing to do analysis on.
TRY_PARSE is slow, but for a one-time conversion as you try to fix a bad idea, it’s fine.
The time zone name is taken from a list maintained in the following Windows registry hive:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones
Note: For SQL Server on Linux (and Docker containers), a registry shim performs the same function as the Windows registry by intercepting the API calls and returning the expected value(s).
We can also use a Transact-SQL (T-SQL) query against the system view
sys.time_zone_info, which uses the information from the registry hive. This is the recommended method if you do not have access to the registry hive.
Click through for a couple of examples.
This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.
I was inspired to write this quiz because I’ve been teaching TSQL School each week. When we covered aggregate functions, I remembered just how tricksy and non-average that AVG() can be.
Want to take the quiz before you read the answers?
Head over here and take the quiz first.
Or read this post and then take the quiz, if you prefer. I’m not gonna call it cheating, because it’s learning either way.
Kendra explains each of the answers, so I’d recommend taking the quiz first.
Now here’s something we didn’t cover in the video and is especially helpful if something just WILL NOT work and you’ve spent all morning panic eating biscuits.
You can write checks to see if something is numeric, or an integer, with
The general “‘is.XXXXX()’” function will take many of the data types we cover here and more, and can be a real time/life saver.
We could also use
class()here and inspect the result.^[You might recall that
class(1)had the result of “numeric” – R was not by default considering 1 as an integer for the purpose of the
class()function. ### Special numbers As well as
ito denote imaginary numbers, there are some additional symbols you might encounter or want to use.
There’s a video as well as a full blog post.
It occurred to me that we haven’t covered the
TIMESTAMPdata type in this series about dates and times.
TIMESTAMPis the Windows Millennium Edition of data types. It has nothing to do with date and time. It’s a row version. Microsoft asks that we stop calling it
DECIMALis a synonym of
NUMERIC, so too is
TIMESTAMPa synonym of
ROWVERSION. Please call it a
ROWVERSIONand pretend that
TIMESTAMPdoesn’t exist. Microsoft is deeply sorry for the confusion.
As I say, dates and times are hard. But at least this is easy: if you don’t use it, you won’t have problems with it.
There are no other textual/alpha string values that will cast to a bit value, but the numeric values that will cast to a bit are voluminous (even some that are in string format). Consider the following eight statements:
SELECT CAST(100 AS bit);
SELECT CAST(-100 AS bit);
SELECT CAST(99999999999999999999999999999999999999 AS bit);
SELECT CAST(-99999999999999999999999999999999999999 AS bit);
SELECT CAST(88.999999 AS bit);
SELECT CAST('1' AS bit);
SELECT CAST('2' AS bit);
SELECT CAST('999999' AS bit);
Danged if they didn’t all work, and all return 1.
Check out what else Louis tries to cast to a bit type.
DATETIMEOFFSETworks the same way as the
DATETIME2data type, except that it is also time zone aware. It is formatted as
Got all that?
YYYYrepresents a four-digit year,
MMis a two-digit month between 1 and 12,
DDis a two-digit day between 1 and 31 depending on the month,
HHrepresents a two-digit hour between 0 and 23,
mmis the minutes between 0 and 59, while
ssis the number of seconds between 0 and 59. Once again,
nrepresents between zero and seven decimal places in a fraction of a second.
The main difference from
DATETIME2is the time zone offset at the end, which is the number of hours and minutes as an offset from UTC time.
Read on for more. I generally don’t use this date type much, preferring to stick with
DATETIME2 and saving data as UTC.
This week, we look at the
DATETIME2data type. I’m not the first person to think that this was probably not the best name for a data type, but here we are, a decade later.
DATETIME2is, at its heart, a combination of the
TIMEdata types we covered in previous weeks.
DATEis 3 bytes long and
TIMEis between 3 and 5 bytes long depending on accuracy. This of course means that
DATETIME2can be anything from 6 to 8 bytes in length.
Nowadays, if you want to store a date plus time, this should be your default, not
This week, we look at the
TIMEdata type. It is formatted as
HHis hours between 0 and 23,
mmis minutes between 0 and 59,
ssis seconds between 0 and 59, and
frepresents 0 or more fractional seconds, up to a maximum of seven decimal places.
With a maximum length of 5 bytes,
TIMEcan store a value with a granularity of up to 100 nanoseconds.
I tend not to use
TIME very often. It’s useful if you need it, but I rarely find myself needing a dateless time.