Creating The Ultimate Calendar Table

Daniel Hutmacher has started to create a comprehensive calendar table:

It’s a collection of inline table value functions that generate different types of calendars, with a number of properties that could be relevant for a calendar dimension. Each function has a unique date column, so you can join the functions you need together in a view or a procedure. The functions are:

  • Dates: a plain gregorian calendar.

  • Fiscal, annual: a gregorian, year-based calendar where you can define the start of a year, like a corporate fiscal calendar.

  • Fiscal, 4-4-5 or 52/53: a week-based calendar where years comprise four quarters of 4+4+5 weeks respectively.

  • Indian national calendar

  • Persian calendar

  • Thai calendar

  • Dates of Catholic and Orthodox easter

  • Lunar cycle

I was going to jokingly be shocked that this list didn’t include the Hebrew or Islamic calendars, but then Daniel had to ruin my fun by explaining why not.  Check it out and when you’re ready to give it a try, head over to his downloads page.

Building A Calendar Table

Louis Davidson has an example of a calendar table in SQL Server:

The solution is part of my calendar/date dimension code, and it is used to do relative positioning over date periods. For example, say you have the need to get data from the 10 days. You can definitely use a simple between to filter the rows, and a bunch of date functions to group by year, month, etc., generally all of the “normal” groupings. But using a calendar table allows you to prebuild a set of date calculations that make the standard values easier to get, and non-standard groupings possible. The technique I will cover makes moving around in the groupings more easily accessible. Like if you want data from the last 3 complete months. The query to do this isn’t rocket science, but it isn’t exactly straightforward either.

For the example, I will use the calendar table that I have on my website here: http://drsql.org/code in the download SimpleDateDimensionCreateAndLoad, and will load it with data up until 2020. Here is that structure:

Read on for examples of usage.  This is an example where thinking relationally differs from thinking procedurally—imagining date ranges as pre-calculated sets isn’t intuitive to procedural developers, but it can give a big performance boost.

DATENAME In SQL Server

Randolph West continues his dates and times series with a new function, DATENAME():

There are many similarities between DATEPART and DATENAME. Where DATEPART returns the date or time part as an integer, DATENAME returns the part as a character string.

This DATENAME function also takes two parameters: the date or time part we want back, and the input date. Just as we saw with DATEPART, the documentation indicates the input date parameter must be an “expression that can resolve to one of the following data types: datesmalldatetimedatetimedatetime2datetimeoffset, or time.”

Similarly, the date and time parts that can be returned look much like those in DATEPART, which gives us another opportunity for the reminder that we should avoid using the available abbreviations in order to help with writing clearly understandable code.

DATENAME is a useful function for displaying parts of dates & times, but Randolph does lay out the caveats.

Using DATEPART In SQL Server

Randolph West shares some thoughts on the DATEPART function:

As we learned some time ago, an INT (integer) in T-SQL uses four bytes, and has a maximum value greater than zero of over 2 billion (there are more than 4 billion values in an integer if we take the negative values into account).

Why then are date and time parts expressed as an INT, which have a lot of overhead for values like 24, 31, and 60?

There are two reasons:

  1. Integers make things simpler. When writing arithmetic in T-SQL and other programming languages, the default data type is usually an integer. Having to memorise which data types are returned from built-in functions becomes unnecessary when we know it will be an INT. Yes, it uses extra memory, but four bytes is a reasonable trade-off against complexity.

  2. One of the return values is nanoseconds. Although DATETIME2(7) only has a granularity down to 100 nanoseconds, DATEPART allows us to return this value, which requires a data type large enough to contain an integer up to 1 billion (nanoseconds can range from 0 to 999,999,900 in increments of 100).

Randolph also explains what happened a few years back to cause iPhone alarms not to fire on January 1st and 2nd.

Removing Time From A DateTime

Wayne Sheffield compares the performance of four methods for removing time from a DateTime data type:

Today, we’ll compare 3 other methods to the DATEADD/DATEDIFF method:

  1. Taking advantage of the fact that a datetime datatype is stored as a float, with the decimal being fractions of a day and the whole numbers being days, we will convert the datetime to float, taking the floor (just the whole numbers), and converting back to datetime.
  2. Using the DATEADD/DATEDIFF routine.
  3. Converting the datetime to DATE and back to datetime.
  4. Converting the datetime to varbinary (which returns just the time), and subtracting that from the datetime value.

While there are other ways of stripping the time (DATETIMEFROMPARTS, string manipulation), those ways are already known as poorly performing. Let’s just concentrate on these four.

Click through for the methods, as well as a performance test to see which is fastest.

Date And Time Functions To Avoid

Randolph West shares his thoughts on three functions he’d rather you avoid:

CURRENT_TIMESTAMP is the ANSI-equivalent of GETDATE(). ANSI is an acronym for the American National Standards Institute, and sometimes vendors will include ANSI functions in their products so they can say that they’re ANSI-compliant (which is not a bad thing, in most cases).

There are three main problems with CURRENT_TIMESTAMP:

  • No brackets. It goes against the rules about functions. So much for standards!
  • It’s functionally equivalent to GETDATE(), which uses DATETIME, which we previously identified is old and bad.
  • It’s too similar to the poorly-named TIMESTAMP data type, which has nothing to do with dates and times and should be called ROWVERSION.

Bottom line: don’t use CURRENT_TIMESTAMP.

At one point I used CURRENT_TIMESTAMP over GETDATE() with the thought of portability in mind.  Since then, my thoughts on code portability have changed and regardless, as Randolph mentions, it’s better to use DATETIME2 functions to avoid precision issues with DATETIME.

Let’s Not Talk About Timestamp

Randolph West hits us with a misnamed SQL Server data type:

It occurred to me that we haven’t covered the TIMESTAMP data type in this series about dates and times.

TIMESTAMP is the Windows Millennium Edition of data types. It has nothing to do with date and time. It’s a row version. Microsoft asks that we stop calling it TIMESTAMP and use ROWVERSION instead.

Much like DECIMAL is a synonym of NUMERIC, so too is TIMESTAMP a synonym of ROWVERSION. Please call it a ROWVERSION and pretend that TIMESTAMP doesn’t exist. Microsoft is deeply sorry for the confusion.

As I say, dates and times are hard.  But at least this is easy:  if you don’t use it, you won’t have problems with it.

Getting The Current Date And Time In SQL Server

Randolph West shows a few functions which can retrieve current date and time information:

What do we mean by local date and time?

As discussed previously, SQL Server is not time zone aware, nor does it have to be. This is because the operating system that SQL Server runs on can have multiple custom regional settings depending on which user is logged into the server.

This holds true for the SQL Server service account as well, which is just another user on the operating system. When any of these functions is called, it is asking for the date and time from the operating system.

If you’re going to use DATETIME2 (which you generally should), take advantage of the precision that SYSUTCDATETIME() gives you over GETUTCDATE().

DateTime Arithmetic

Claudio Silva points out a difference between the DATETIME and DATETIME2 data types:

I’m currently working on a SQL code migration from Firebird to SQL Server and I hit an error that I haven’t seen for some time.

The error message is the following:

Msg 206, Level 16, State 2, Line 4
Operand type clash: datetime2 is incompatible with int

This ringed the bell right away! Somewhere on the code someone was trying to do an arithmetic calculation without using the proper function.

Read on for the solution.

Helper Predicates And Multi-Column Filters

Rob Farley has an interesting post on optimizing a lookup when you have separate date and time columns:

Here we see a Seek Predicate that looks for OrderDate values between two values that have been worked out elsewhere in the plan, but creating a range in which the right values must exist. This isn’t >= 20110805 00:00 and < 20110806 00:00 (which is what I would’ve made it), it’s something else. The value for start of this range must be smaller than 20110805 00:00, because it’s >, not >=. All we can really say is that when someone within Microsoft implemented how the QO should respond to this kind of predicate, they gave it enough information to come up with what I call a “helper predicate.”

Now, I would love Microsoft to make more functions sargable, but that particular request was Closed long before they retired Connect.

But maybe what I mean is for them to make more helper predicates.

The problem with helper predicates is that they almost certainly read more rows than you want. But it’s still way better than looking through the whole index.

Read the whole thing.

Categories

August 2018
MTWTFSS
« Jul  
 12345
6789101112
13141516171819
20212223242526
2728293031