Press "Enter" to skip to content

Curated SQL Posts

The Problems with VARCHAR(MAX)

Deepthi Goguri has a complaint about VARCHAR(MAX):

Though VARCHAR(MAX) is suitable in situations with large strings of data, it has its own complications that we need to consider. In my career as a DBA, I at least saw couple of times SQL developers using VARCHAR(MAX) when they should not. Fixing the datatypes once in production is painful and causes risks.

I think Deepthi’s advice is sound: use VARCHAR(MAX) when necessary but not as a starting point. I don’t shy away from VARCHAR(MAX) on principle (except with columnstore indexes—get that noise right out of here) and I don’t think you should either, as long as you understand the ramifications.

Comments closed

Date and Time Data Types

Deborah Melkin takes a look at the different date and time data types:

But I think I’ll “wax poetic” about datetimes and all of the varieties. When I first got started with SQL Server back in the SQL 6.5 days, we really just had datetime, which was the date and time. If you needed just the time, you ignored the date portion. If you needed just the date, you put midnight (00:00:00.000) for the time. And when you dealt with the time portion, you deal with rounding to the .003 millisecond.

Read on for more info about each type.

1 Comment

Fragmentation and GUIDs

Chad Callihan doesn’t llike the UNIQUEIDENTIFIER data type:

My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.

On this topic, Jeff Moden had a really great presentation for our SQL Server user group. He has a rather contrarian take and interesting findings on fragmentation in practice. It’s a lengthy and advanced talk, but definitely worth the nearly 2 1/2 hours.

Comments closed

Against VARBINARY(MAX)

Travis Page wants you to think twice before using VARBINARY(MAX):

Brent Ozar put out a call for blogging about your data type of choice. This might be a favorite (or least favorite) datatype. Naturally, varchar(max) and nvarchar(max) are going to have their punishment, deservedly. The datatype I’m not so sure is going to be getting the bludgeoning it deserves is varbinary(max). Sure, there are valid uses for the data type, but having the potential for just shy of 2GB of binary LOB data stored in a database has some negative potentials. Let’s take a look at some of these pitfalls.

Click through for a good reckoning of the downsides. In terms of upside, two good ones that I actively use are:

  • VARBINARY(MAX), when combined with the COMPRESS() function, can efficiently store a lot of text data, as that text data gets gzipped. This works best if you just need to store data but not search through it. Then, DECOMPRESS() when it’s time for that data to show up in an app somewhere.
  • SQL Server Machine Learning Services models are stored in VARBINARY(MAX).

Comments closed

A Mini-Lab: SQL Server 2019 on Docker

Ayman El-Ghazali has a three-parter for us. Part one involves installing Docker for Desktop on Windows and creating a SQL Server container:

Recently, with the help of a colleague at work, I’ve started to dabble a little with containers. I had a customer that requested some specific code to be tested, and I realized that I didn’t have my own local instance of SQL running (always good to have a local one). I decided to try to make this process easier instead of going the traditional route of creating a Virtual Machine and also to help me learn a new technology. In these series of posts, I’m going to document my process of creating a Mini Data Lab for SQL Server on my desktop using Docker. It is intended to be for beginners and in no way is an article for best practices or production deployments.

Part two includes persistent storage and some of the other niceties of hosting a database in a container:

Let’s first take a look at the way I have my disk/folder structure laid out. Again, this is on my personal computer so it’s not a best practice for production and more suitable for development environments.

For each container, I’m creating a separate folder with the MSSQL paths that I need to put my databases, transaction log, and backup files on. Additionally, under the DockerMount folder I have a folder called sqldockershared (which I will put some shared content in later).

Part three is about configuration in existing containers and deploying a second container side-by-side:

For those that are more curious in changing other Instance level properties here is a list of configurable properties for SQL Server on Linux via Bash.
https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-mssql-conf?view=sql-server-ver15

Now we have a great foundation to create another container, so let us go and do that now. The code is similar to the previously created container with the exception of the file path for the data, log, and back files and the port number. The SQLShare path will be the same so that we can run our initialization script from there.

I’m bought-in on containers. There are still some pains around containers for production databases, but “some pain” is a much better experience than a few years ago, when the answer to the question of whether you want to use containers in production for databases was “Are you mad?”

Comments closed

Against Abused Data Types

Reitse Eskens hates misused data types:

First up. A large amount of my work has to do with ETL processes. There are a lot of things that can go wrong there, but one of the main issues is wrong estimations on size. When we read data from a source system to transfer it to the datawarehouse environment we have to match the datatypes. A varchar(10) in the source will have to be a varchar(10) in the target. Easy enough. But now the source gets an update and with that update the source datatype goes from varchar(10) to varchar(12). When the supplier informs us, we change the datatype accordingly and everything is fine. When for some reason the update is missed, issues will arise. Because off course it’s the primary key that got enlarged and duplicates will start to form.

The other way around happens as well. Some tools check out the source, see a varchar column and, when no-one notices, will create a nvarchar(2000) column. Joy will arise when this column contains one or two characters when the optimizer expects at least a thousand characters.

I’m in almost complete agreement with this notion, with the exception that I think sql_variant is an abomination and its existence in a database is ipso facto proof that the designer came up with (or was forced into) a bad solution.

Comments closed

Row-Level Security and UseRelationship

Teo Lachev points out an issue when combining row-level security with the USERELATIONSHIP() function in a Tabular model:

You’ve created a beautiful, wide-open Tabular model. You use USERELATIONSHIP() to switch relationships on and off. Everything works and everyone is pleased. Then RLS sneaks in, such as when external users need access, and you must secure on some dimension table. You create a role, specify a row filter, test the role, and get greeted with:

The UseRelationship() and CrossFilter() functions may not be used when querying ‘<dimension table>’ because it is constrained by row-level security defined on ‘<dimension table>’ or related tables.

Read on to learn what the issue is and one potential workaround.

Comments closed

The Geography Data Type

Greg Dodd talks about the GEOGRAPHY data type:

If you read through the docs on data types, you hit the “Other data types” section, and you start scratching your head: cursor, hierarchyid, rowversion. I’m sure XML will have a special place for some (love or hate).

The datatype that I think is the coolest has to be Spatial Geography. Under the covers, it’s probably the same as Geometry, but no where near as much fun. What is Geography? It’s what it sounds like – a way to store Latitude and Longitude data that will let you query it back again and plot it on a map, or measure distances from it.

Also read Rob Farley’s note on the topic, which mirrors my thoughts: spatial data types are quite relevant for comparison work. And a bit of data manipulation in the database can save a lot of network traffic.

Comments closed

In Praise of the XML Data Type

Eitan Blumin likes an underappreciated data type:

The xml data type in SQL Server exists since about SQL Server 2005, and it introduced a lot of very powerful and useful capabilities that were never before seen in SQL Server. It was the first-ever data type with built-in CLR methods (i.e. where you can write a dot after a column name and execute some kind of method. For example: mycolumn.nodes(…)).

The xml data type and the functionality around it made it relatively easy to “refactor” a resultset from a relational structure into a scalar structure (i.e. a single XML document) using the FOR XML directive, and vice versa (single XML document into a relational structure) using the nodes()value(), and query() methods.

Show me a thousand posts and I don’t expect to see XML show up as a favorite type more than three or four times. Eitan also shares a least favorite type, and I do expect that one to show up on the most-hated list quite frequently.

Comments closed