Press "Enter" to skip to content

Category: T-SQL Tuesday

Against VARBINARY(MAX)

Travis Page wants you to think twice before using VARBINARY(MAX):

Brent Ozar put out a call for blogging about your data type of choice. This might be a favorite (or least favorite) datatype. Naturally, varchar(max) and nvarchar(max) are going to have their punishment, deservedly. The datatype I’m not so sure is going to be getting the bludgeoning it deserves is varbinary(max). Sure, there are valid uses for the data type, but having the potential for just shy of 2GB of binary LOB data stored in a database has some negative potentials. Let’s take a look at some of these pitfalls.

Click through for a good reckoning of the downsides. In terms of upside, two good ones that I actively use are:

  • VARBINARY(MAX), when combined with the COMPRESS() function, can efficiently store a lot of text data, as that text data gets gzipped. This works best if you just need to store data but not search through it. Then, DECOMPRESS() when it’s time for that data to show up in an app somewhere.
  • SQL Server Machine Learning Services models are stored in VARBINARY(MAX).

Comments closed

Fragmentation and GUIDs

Chad Callihan doesn’t llike the UNIQUEIDENTIFIER data type:

My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.

On this topic, Jeff Moden had a really great presentation for our SQL Server user group. He has a rather contrarian take and interesting findings on fragmentation in practice. It’s a lengthy and advanced talk, but definitely worth the nearly 2 1/2 hours.

Comments closed

Against Abused Data Types

Reitse Eskens hates misused data types:

First up. A large amount of my work has to do with ETL processes. There are a lot of things that can go wrong there, but one of the main issues is wrong estimations on size. When we read data from a source system to transfer it to the datawarehouse environment we have to match the datatypes. A varchar(10) in the source will have to be a varchar(10) in the target. Easy enough. But now the source gets an update and with that update the source datatype goes from varchar(10) to varchar(12). When the supplier informs us, we change the datatype accordingly and everything is fine. When for some reason the update is missed, issues will arise. Because off course it’s the primary key that got enlarged and duplicates will start to form.

The other way around happens as well. Some tools check out the source, see a varchar column and, when no-one notices, will create a nvarchar(2000) column. Joy will arise when this column contains one or two characters when the optimizer expects at least a thousand characters.

I’m in almost complete agreement with this notion, with the exception that I think sql_variant is an abomination and its existence in a database is ipso facto proof that the designer came up with (or was forced into) a bad solution.

Comments closed

The Geography Data Type

Greg Dodd talks about the GEOGRAPHY data type:

If you read through the docs on data types, you hit the “Other data types” section, and you start scratching your head: cursor, hierarchyid, rowversion. I’m sure XML will have a special place for some (love or hate).

The datatype that I think is the coolest has to be Spatial Geography. Under the covers, it’s probably the same as Geometry, but no where near as much fun. What is Geography? It’s what it sounds like – a way to store Latitude and Longitude data that will let you query it back again and plot it on a map, or measure distances from it.

Also read Rob Farley’s note on the topic, which mirrors my thoughts: spatial data types are quite relevant for comparison work. And a bit of data manipulation in the database can save a lot of network traffic.

Comments closed

In Praise of the XML Data Type

Eitan Blumin likes an underappreciated data type:

The xml data type in SQL Server exists since about SQL Server 2005, and it introduced a lot of very powerful and useful capabilities that were never before seen in SQL Server. It was the first-ever data type with built-in CLR methods (i.e. where you can write a dot after a column name and execute some kind of method. For example: mycolumn.nodes(…)).

The xml data type and the functionality around it made it relatively easy to “refactor” a resultset from a relational structure into a scalar structure (i.e. a single XML document) using the FOR XML directive, and vice versa (single XML document into a relational structure) using the nodes()value(), and query() methods.

Show me a thousand posts and I don’t expect to see XML show up as a favorite type more than three or four times. Eitan also shares a least favorite type, and I do expect that one to show up on the most-hated list quite frequently.

Comments closed

A Review of Numerical Data Types

Lina Kovacheva reviews SQL Server’s numerical data types:

SQL Server provides two dedicated data types for storing monetary values. You can think of MONEY and SMALLMONEY as DECIMAL – Money is effectively the same as DECIMAL(19,4) while SMALLMONEY is effectively the same as DECIMAL(10,4). If you are planning to use MONEY you should have in mind that performing division and multiplication can cause rounding errors that result in the unintentional loss of precision. The cause of the problem is that MONEY only saves information up to the 4th decimal place and if your multiplication or division results in an integer that goes to the 5th decimal place or more, MONEY will round it off, causing an accuracy error.

Click through for an analysis of these data types.

Comments closed

The Unique Properties of DateTimeOffset

Rob Farley analyzes a special data type:

And as I have a unique index on this, it won’t let me insert 00:30 in UTC+11, because 00:00 in UTC+10:30 is already there. It tells me “Msg 2627, Level 14, State 1, Line 19. Violation of PRIMARY KEY constraint ‘pkTimesOffset’. Cannot insert duplicate key in object ‘dbo.TimesOffsets’. The duplicate key value is (2021-01-01 00:30:00.0000000 +11:00).”

My general rule is to store everything in SQL Server as UTC. If I did not do this, I would very strongly advocate for using DateTimeOffsets regardless of the extra data length. I’ve experienced the pain of mismatched date and time details one too many times for that.

Fun bonus fact: the same applies to .NET as well. If I control the system, I’m using DateTime.UtcNow for everything. If not, I’m leaning heavily toward DateTimeOffset by default. Again, too many times have I experienced that source system X has times marked in Pacific Standard Time pushing data to a server in Eastern Standard Time, and then mixing in a server based in Central Standard Time and having people confused because “the times are wrong.”

Comments closed

SQL Server Data Types: Bit vs the World

Kevin Chant is a fan of the bit type:

I decided to tweak it a bit for this post, to provide a humorous comparison between the bit data type and others that are available in SQL Server. In reality, this won’t cover every single one.

By the end of this post, you will some see pitfalls to using certain data types and some tips on how to avoid them. Plus, you will find out which data type is my least favourite to use.

Click through for a “haha-just-serious” take on an underappreciated datatype which nonetheless can’t decide if it’s a boolean or not.

Comments closed