Press "Enter" to skip to content

Category: Data Types

T-SQL Tuesday 136 Wrap-Up

Brent Ozar rounds up the usual suspects, plus several more:

For this month’s T-SQL Tuesday, I asked you to blog about your most-loved and least-loved data types.

Crazy, right? How could people possibly love or hate data types? Well, if you’ve been working with them for a while, I figured you’d have built up an array of tips or pain points, and y’all delivered with 29 interesting blog posts.

Click through for a lengthy list of interesting posts.

Comments closed

Disliking User-Defined Data Types

Andy Levy has a bone to pick:

Here’s the thing – these types are really just aliases for native types in SQL Server, but more constrained. Constrain yourself to UDTs and you’ll have trouble right-sizing your fields. Let’s say you’ve got three data types for text data:

– myShortString (varchar(10))
– myString (varchar(256))
– myBigString (varchar(8000))

These lengths are not helping anyone. You can’t store email addresses or names in myShortString. But myString is probably way too much for that data. You’re going to waste memory because of how SQL Server estimates memory grants and your indexes will be bloated. Maybe you just need to create more UDTs to cover these situations. But that just compounds the other problems, doesn’t it?

Pushes glasses up the bridge of his nose. Teeeeechnically, an e-mail address may be up to 256 characters long, including a username of up to 64 characters (and maybe two angle brackets, depending on the host). So myString would actually be perfect. Steve Jones has a comment about 300, but that was probably the original standard of 320. Regardless, I realize how far beside the point that is, and Andy’s point is a good one, as well as the several others he makes in the post.

One quick note on defined types: they really do make a lot of sense in a domain-driven design, especially when working with functional programming languages. Defining a CustomerID as an int is fine, but if I know my customer IDs are natural numbers (1, 2, 3, …), 9 digits long, and do not contain the sequence 2345 (because my company considers this an unlucky number sequence), creating a CustomerID type which provides this sort of type checking is great because you keep the rules as close to the data as possible and ensure consistency. It’s also more restrictive than int, so you can cast back down to an int when you’re ready to interact with some remote system. So short answer, do this all day in F#, but not in the database.

Comments closed

In Defense of Float

Hugo Kornelis levies a defense of floating point data types:

Let’s return to the database. Let’s figure out a way to store these numbers appropriately.

Could we use decimal (or its synonym numeric)? Well, yes. We can. We need 25 digits after the decimal place for 0.0000000000000000000004052, and 9 digits before the decimal place for 299900000, so that would fit in a decimal(34,25). But if you try to compute c2 so you can then multiply that to the m, you’ll run into an overflow error.

Hugo does a good job of defending the float data type.

Comments closed

Two Disliked Data Types

Aaron Bertrand has two bones to pick:

I am not often one to do the bare minimum, unless it comes to looking for things around the house. After about 22 seconds I throw my hands in the air and exclaim, “I can’t find it!” As my wife so kindly added: It usually turns out to be in a spot I already looked.

But when it revolves around SQL Server and opinions, I’m all over it. So I’m not going to talk about a data type today; I’m going to talk about two of them. One of them Brent already mentioned in his invite:

I kinda-sorta disagree with Aaron’s second choice. By that I mean that I fully agree with his premise: use UTC everywhere. But if you don’t use UTC everywhere, then use DATETIMEOFFSET everywhere and apply the time zones.

Comments closed

Date and Time Data Types

Deborah Melkin takes a look at the different date and time data types:

But I think I’ll “wax poetic” about datetimes and all of the varieties. When I first got started with SQL Server back in the SQL 6.5 days, we really just had datetime, which was the date and time. If you needed just the time, you ignored the date portion. If you needed just the date, you put midnight (00:00:00.000) for the time. And when you dealt with the time portion, you deal with rounding to the .003 millisecond.

Read on for more info about each type.

1 Comment

The Problems with VARCHAR(MAX)

Deepthi Goguri has a complaint about VARCHAR(MAX):

Though VARCHAR(MAX) is suitable in situations with large strings of data, it has its own complications that we need to consider. In my career as a DBA, I at least saw couple of times SQL developers using VARCHAR(MAX) when they should not. Fixing the datatypes once in production is painful and causes risks.

I think Deepthi’s advice is sound: use VARCHAR(MAX) when necessary but not as a starting point. I don’t shy away from VARCHAR(MAX) on principle (except with columnstore indexes—get that noise right out of here) and I don’t think you should either, as long as you understand the ramifications.

Comments closed

Against VARBINARY(MAX)

Travis Page wants you to think twice before using VARBINARY(MAX):

Brent Ozar put out a call for blogging about your data type of choice. This might be a favorite (or least favorite) datatype. Naturally, varchar(max) and nvarchar(max) are going to have their punishment, deservedly. The datatype I’m not so sure is going to be getting the bludgeoning it deserves is varbinary(max). Sure, there are valid uses for the data type, but having the potential for just shy of 2GB of binary LOB data stored in a database has some negative potentials. Let’s take a look at some of these pitfalls.

Click through for a good reckoning of the downsides. In terms of upside, two good ones that I actively use are:

  • VARBINARY(MAX), when combined with the COMPRESS() function, can efficiently store a lot of text data, as that text data gets gzipped. This works best if you just need to store data but not search through it. Then, DECOMPRESS() when it’s time for that data to show up in an app somewhere.
  • SQL Server Machine Learning Services models are stored in VARBINARY(MAX).

Comments closed

Fragmentation and GUIDs

Chad Callihan doesn’t llike the UNIQUEIDENTIFIER data type:

My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.

On this topic, Jeff Moden had a really great presentation for our SQL Server user group. He has a rather contrarian take and interesting findings on fragmentation in practice. It’s a lengthy and advanced talk, but definitely worth the nearly 2 1/2 hours.

Comments closed

Against Abused Data Types

Reitse Eskens hates misused data types:

First up. A large amount of my work has to do with ETL processes. There are a lot of things that can go wrong there, but one of the main issues is wrong estimations on size. When we read data from a source system to transfer it to the datawarehouse environment we have to match the datatypes. A varchar(10) in the source will have to be a varchar(10) in the target. Easy enough. But now the source gets an update and with that update the source datatype goes from varchar(10) to varchar(12). When the supplier informs us, we change the datatype accordingly and everything is fine. When for some reason the update is missed, issues will arise. Because off course it’s the primary key that got enlarged and duplicates will start to form.

The other way around happens as well. Some tools check out the source, see a varchar column and, when no-one notices, will create a nvarchar(2000) column. Joy will arise when this column contains one or two characters when the optimizer expects at least a thousand characters.

I’m in almost complete agreement with this notion, with the exception that I think sql_variant is an abomination and its existence in a database is ipso facto proof that the designer came up with (or was forced into) a bad solution.

Comments closed

The Geography Data Type

Greg Dodd talks about the GEOGRAPHY data type:

If you read through the docs on data types, you hit the “Other data types” section, and you start scratching your head: cursor, hierarchyid, rowversion. I’m sure XML will have a special place for some (love or hate).

The datatype that I think is the coolest has to be Spatial Geography. Under the covers, it’s probably the same as Geometry, but no where near as much fun. What is Geography? It’s what it sounds like – a way to store Latitude and Longitude data that will let you query it back again and plot it on a map, or measure distances from it.

Also read Rob Farley’s note on the topic, which mirrors my thoughts: spatial data types are quite relevant for comparison work. And a bit of data manipulation in the database can save a lot of network traffic.

Comments closed