DATETIMEis an eight-byte datatype which stores both a date and time in one column, with an accuracy of three milliseconds. As we’ll see though, the distribution of this granularity may not be exactly what we’d expect.
DATETIMEvalues are January 1, 1753 00:00:00.000, through December 31, 9999 23:59:59.997. On older databases designed prior to SQL Server 2008, because there was no explicit support for date values, it was sometimes customary to leave off the time portion of a
DATETIMEvalue, and have it default to midnight on that morning. So for example today would be stored as
February 21, 2018 00:00:00.000.
If you’re not particularly familiar with SQL Server data types, this is detailed enough information to get you going and to explain exactly why you shouldn’t use DATETIME anymore…
Given the following DDLCREATE TABLE dbo.IntToTime ( CREATE_TIME int );
What will be the result of issuing the following command?ALTER TABLE dbo.IntToTime ALTER COLUMN CREATE_TIME time NULL;
Clearly, if I’m asking, it’s not what you might expect.
Click through if you have not memorized your implicit conversion tables.
The worst are by far the string representation, as there is no actual check on the contents. It can literally contain everything. And is ’01/02/2018′ the first of February 2018 (like any sane person would read, because days come before months), or the 2nd of January? So if you have to store dates in your data warehouse, avoid strings at all costs. No excuses.
The integer representation – e.g. 20171208 – is really popular. If I recall Kimball correctly, he said it’s the one exception where you can use smart keys, aka surrogate keys that have a meaning embedded into them. I used them for quite some time, but I believe I have found a better alternative: using the actual date data type.
I bounce back and forth, but I’m sympathetic to Koen’s argument, which you can read by clicking through.
Yes, you have read it… I see dates stored as NVARCHAR(10) and NCHAR(10) on daily basis, please don’t ask me why.
This case is even worse, because DATE takes 3 bytes where NCHAR(10) takes 20 bytes, yes Ladies and Gentlemen more than 6 times more space to store the same data.
But wait! how can you be certain that those ten characters are actually a valid date? You can’t, unless you reinvent the wheel and validate that those dates are obviously valid dates and pay the performance penalty of doing it.
You’d think that picking the right data type for something would be fairly easy and then you find a table with a few dozen NVARCHAR(MAX) columns.
I was recently at a site where they were changing all their bigint columns to uniqueidentifier columns (ie: GUID columns) because they were worried about running out of bigint values. In a word, that’s ridiculous. While it’s easy to say “64 bit integer”, I can assure you that understanding the size of one is out of our abilities. In 1992, I saw an article that said if you cleared the register of a 64 bit computer and put it in a loop just incrementing it (adding one), on the fastest machine available that day, you’d hit the top value in 350 years. Now machines are much faster now than back then, but that’s a crazy big number.
If you’re at risk of running out of bigints, you have a lot of data—that’s more than 18 quintillion rows that you can hold, and if my quick math is up to snuff, would be roughly 127 thousand petabytes of data just to store the IDs.
Quite a lot to take in. Let’s break this down.
DATETIME2is a data type that was introduced in SQL Server 2008. It uses up to 8 bytes to store a date and time: 3 bytes for the date component, and up to 5 bytes for the time component.
The point here is that it uses 8 bytes in total. That’s it. No more.
Jemma noted that when converting the
DATETIME2data type to
BINARY, it suddenly became longer by exactly one byte, which seems strange.
Read on for the solution.
For years, I had thought (and was probably taught in SQL.AlongTimeAgoInAPlaceFarFarAway) that the timestamp column (well before rowversion was a thing,) was not guaranteed to be an ever increasing value. But this is not the case.
In BOL (https://docs.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql) it states:
“The rowversion data type is just an incrementing number…”
This makes it useful for determining rows that have changed, because it it automatic and the user cannot override the value in the column. However, there is a major concern when you use rowversions, and that is what happens when you change the structure of the table, and expect the consumer to see that change. The problem is that when you change the structure of the table, the rowversion will not be set (except when adding a new rowversion column.)
I’m not much of a fan of rowversion and tend not to use it, but my biases don’t have to be yours.
Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can’t seek using the index because it ends up having to scan the whole table to convert every record to a number first.
And this doesn’t only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:
This is one of those things that can easily elude you because the query will often return results in line with what you expect, so until you have a performance problem, you might not even think to check.
This post talks about the issue I ran into with SSIS Mapping Files.
We currently run DB2 on an IBM iSeries AS400 for our ERP system. I was tasked with copying data from the AS400 to a SQL Server database for some consultants to use. The C-Suite didn’t want to give the consultants access to our AS400, so this was the work around that was put forth and accepted (and no, no one asked me before I was “voluntold” for the task). Since this would essentially be a “one-time” thing, I chose to use the Import Export Wizard, but I would save the package just in case they wanted this process repeated.
Sounds like it was a painful experience, but it does have a happy ending.
So to start with how does partitioning handle a NULL? If you look in the BOL for the CREATE PARTITION FUNCTION you’ll see the following:
Any rows whose partitioning column has null values are placed in the left-most partition unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.
So basically NULLs are going to end up in the left most partition(#1) unless you specifically make a partition for NULL and are using a RIGHT partition. So let’s start with a quick example of where NULL values are going to end up in a partitioned table (a simple version).
Click through to see Kenneth’s proof and the repercussions of making that partitioning column nullable.