Press "Enter" to skip to content

Category: Data Types

Thoughts On Data Sizing

Greg Low has some thoughts around data types and sizes:

I was recently at a site where they were changing all their bigint columns to uniqueidentifier columns (ie: GUID columns) because they were worried about running out of bigint values. In a word, that’s ridiculous. While it’s easy to say “64 bit integer”, I can assure you that understanding the size of one is out of our abilities. In 1992, I saw an article that said if you cleared the register of a 64 bit computer and put it in a loop just incrementing it (adding one), on the fastest machine available that day, you’d hit the top value in 350 years. Now machines are much faster now than back then, but that’s a crazy big number.

If you’re at risk of running out of bigints, you have a lot of data—that’s more than 18 quintillion rows that you can hold, and if my quick math is up to snuff, would be roughly 127 thousand petabytes of data just to store the IDs.

Comments closed

Converting DATETIME2 To VARBINARY

Randolph West unravels a mystery around byte lengths:

Quite a lot to take in. Let’s break this down.

DATETIME2 is a data type that was introduced in SQL Server 2008. It uses up to 8 bytes to store a date and time: 3 bytes for the date component, and up to 5 bytes for the time component.

The point here is that it uses 8 bytes in total. That’s it. No more.

Jemma noted that when converting the DATETIME2 data type to BINARY, it suddenly became longer by exactly one byte, which seems strange.

Read on for the solution.

Comments closed

Working With Rowversion Data Types

Louis Davidson walks through some of the properties of rowversion data types:

For years, I had thought (and was probably taught in SQL.AlongTimeAgoInAPlaceFarFarAway) that the timestamp column (well before rowversion was a thing,) was not guaranteed to be an ever increasing value. But this is not the case.

In BOL (https://docs.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql)  it states:

“The rowversion data type is just an incrementing number…”

This makes it useful for determining rows that have changed, because it it automatic and the user cannot override the value in the column. However, there is a major concern when you use rowversions, and that is what happens when you change the structure of the table, and expect the consumer to see that change. The problem is that when you change the structure of the table, the rowversion will not be set (except when adding a new rowversion column.)

I’m not much of a fan of rowversion and tend not to use it, but my biases don’t have to be yours.

Comments closed

Thinking About Implicit Conversions

Bert Wagner shows how implicit conversions in a predicate can ruin query performance:

Why? Because SQL is performing that implicit conversion to the numeric datatype for every single row in my table. Hence, it can’t seek using the index because it ends up having to scan the whole table to convert every record to a number first.

And this doesn’t only happen with numbers and string conversion. Microsoft has posted an entire chart detailing what types of data type comparisons will force an implicit conversion:

This is one of those things that can easily elude you because the query will often return results in line with what you expect, so until you have a performance problem, you might not even think to check.

Comments closed

Mapping File Shenanigans With The Import/Export Wizard

Angela Henry ran into problems copying a boatload of data from a mainframe-hosted DB2 server and has lived to tell the tale:

This post talks about the issue I ran into with SSIS Mapping Files.

We currently run DB2 on an IBM iSeries AS400 for our ERP system.  I was tasked with copying data from the AS400 to a SQL Server database for some consultants to use.  The C-Suite didn’t want to give the consultants access to our AS400, so this was the work around that was put forth and accepted (and no, no one asked me before I was “voluntold” for the task).  Since this would essentially be a “one-time” thing, I chose to use the Import Export Wizard, but I would save the package just in case they wanted this process repeated.

Sounds like it was a painful experience, but it does have a happy ending.

Comments closed

Partitioning Nullable Columns

Kenneth Fisher looks at what happens when you use a nullable column as a partition key:

So to start with how does partitioning handle a NULL? If you look in the BOL for the CREATE PARTITION FUNCTION you’ll see the following:

Any rows whose partitioning column has null values are placed in the left-most partition unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.

So basically NULLs are going to end up in the left most partition(#1) unless you specifically make a partition for NULL and are using a RIGHT partition. So let’s start with a quick example of where NULL values are going to end up in a partitioned table (a simple version).

Click through to see Kenneth’s proof and the repercussions of making that partitioning column nullable.

Comments closed

Benefits Of Deprecated Data Types

Raul Gonzalez shows how to get one of the benefits of older, deprecated data types using (MAX) data types:

We can see that our table is managed by two different allocation units, IN_ROW_DATA and LOB_DATA, which means that all data within columns of the data types above, will end up in different pages by default, regardless of the size of the data.

This is the default behaviour for old LOB types, to be stored separately, but new LOB types (MAX) by default will try to get them In-Row if they are small enough to fit.

Having some of those documents In-Row will result in a serious increase in the number of pages to scan, therefore affecting performance.

Note that for the table scan we have used only the IN_ROW_DATA pages, making it much lighter than if we have to scan the sum of all pages.

This might be helpful for some situations, like where you rarely need to get to the LOB data.

Comments closed

Data Type Mismatches

Kendra Little gets into why certain data type mismatches force scans of tables while others can still allow seeks:

Sometimes we get lucky comparing a literal value to a column of a different type.

But this is very complicated, and joining on two columns of different types in the same family without explicitly converting the type of one of the columns resulted in worse performance in Paul White’s tests, when the columns allowed NULLs! (Note: I haven’t rerun those tests on 2016, but I think the general advice below still applies.)

General advice: don’t rely on being lucky. Pay attention to your data types, and compare values of the same data type wherever possible.

That’s great advice.

Comments closed

Inconsistencies With SQL_VARIANT

Erik Darling warns against using SQL_VARIANT data types:

I half-stumbled on the weirdness around SQL_VARIANT a while back while writing another post about implicit conversion. What I didn’t get into at the time is that it can give you incorrect results.

When I see people using SQL_VARIANT, it’s often in dynamic SQL, when they don’t know what someone will pass in, or what they’ll compare it to. One issue with that is you’ll have to enclose most things in single quotes, in case a string or date is passed in. Ever try to hand SQL those without quotes? Bad news bears. You don’t get very far.

Read on for the demo.  I have never used SQL_VARIANT in any project.  I’ve done a lot of crazy things with SQL Server (some of them intentionally) but never this.

Comments closed

SSRS Category Charts & Ints

Kathi Kellenberger notices an oddity with SSRS Mobile Report category charts:

Notice that OrderYear displays decimal points. I switched the dataset in the Series field name property, and found that neither of the columns in the dataset can be used.

Numeric columns cannot be set as a Series name field. To work around this, I modified the dataset, casting OrderYear as a CHAR(4).

That’s not a great situation, but at least there’s a workaround.

Comments closed