Press "Enter" to skip to content

Category: Data Types

VARCHAR or NVARCHAR

Brent Ozar asks a question:

You’re building a new table or adding a column, and you wanna know which datatype to use: VARCHAR or NVARCHAR?

If you need to store Unicode data, the choice is made for you: NVARCHAR says it’s gonna be me.

But if you’re not sure, maybe you think, “I should use VARCHAR because it takes half the storage space.” I know I certainly felt that way, but a ton of commenters called me out on it when I posted an Office Hours answer about how I default to VARCHAR. One developer after another told me I was wrong, and that in 2025, it’s time to default to NVARCHAR instead. Let’s run an experiment!

This is going back a long way (June of 2020) but one of my earliest YouTube videos is entitled NVARCHAR Everywhere. I’ve gotten a lot better at presentation skill since then (and have a much nicer camera), but I still stand by the arguments.

Leave a Comment

Splitting GUIDs into Multiple BIGINTs for Columnstore

Forrest McDaniel performs an experiment:

You may have run into issues with GUIDs as clustering keys, but another major problem is in columnstore. Smart people at Microsoft wrote columnstore in a way to take advantage of modern CPU features, but those CPU features don’t play well with datatypes larger than 8 bytes. Which includes GUIDs.

Read on for the demonstration of this, a clever workaround, and the ramifications of splitting GUIDs into two BIGINTs. Full points for cleverness, though like Forrest, I wouldn’t want to use this in production.

Leave a Comment

Why Not Use VARCHAR(MAX) for Everything?

David Fowler explains:

When I mentioned to the developer that it’s probably not the best idea, he turned around and asked me, ‘why not?’

It was a genuine question. Why shouldn’t we just spam VARCHAR(MAX) over all of our columns? On the upside, it would get rid of all those annoying issues that crop up when we try to insert a value that overflows the datatype.

Click through for a video as well as a blog post laying out some of the problem with using VARCHAR(MAX) all willy-nilly.

Comments closed

Changing a Busy Column’s Data Type in SQL Server

Matt Gantz makes a staggered change:

In a previous post I showed how to use a batching strategy to remove large amounts of data from a table while it is being used. Today I will apply the same technique to another common problem- changing the datatype of a column. A common use of this is to normalize a text column into an integer (that references another table), but could be used to transition to and from any datatype . Many of the considerations in the previous post apply, so I advise you to read it as well before using this technique.

Click through for the process.

Comments closed

Resulting Data Types from a UNION Operation

Andy Brownsword puts on the lab coat and performs some experiments:

The UNION and UNION ALL operators allow us to combine results, but there’s no guarantee that each set of results uses the same data types. So what data types are returned?

For the longest time I thought the data types from the first set of results were used for the final results. That’s not the case.

Read on to see what the rules look like.

Comments closed

Domains in ANSI SQL

Joe Celko describes a domain:

For example, if there though is that there is a domain called voltage which has a base unit called “volt” that’s otherwise meaningless. Yes, you can get a voltmeter you can watch the needle, you can be told what the IEEE specification for defining how much work a volt should do or shock you. I’ve discussed scales and types of measurements in a previous article, It’s worth mentioning that you should not confuse domain with the representation and symbols of the units being used. Some domains are limited, such as degrees that measure planar angles. An angle can be from 0 to 360°, or it can be between zero and 2π radians.

Joe has an explanation but doesn’t have any concrete examples in psql. Here’s one from the PostgreSQL documentation:

CREATE DOMAIN us_postal_code AS TEXT
CHECK(
   VALUE ~ '^\d{5}$'
OR VALUE ~ '^\d{5}-\d{4}$'
);

The idea of a domain here is that you define a valid slice of some data type. We can do something similar with check constraints on an attribute, but the difference is that we’d need to create the check constraint for each relevant attribute, whereas the domain would include this check automatically, making it quite useful if we have multiple instances of, say, us_postal_code in our database. Then, we wouldn’t need to worry about creating a check constraint on each instance and ensuring that the code remains the same across the board.

This also leads to a very common sentiment in functional programming: make invalid states unrepresentable. In other words, make it impossible for a person or piece of code to generate a result in an invalid state. By defining a domain with the scope of our valid state, we make it impossible for someone to create a US postal code value that does not pass our check, and so we can’t have dirty data of this sort in our database.

Comments closed

Integer Conversion and Rounding in SQL Server

Steve Jones points out a bit of rounding math:

Imagine that I have someone enter a value for the number of hours to include in a report. I enter 5 and the report divides this in half to go back 2.5 hours and forward 2.5 hours. I run this code at the top of my code block:

Click through for Steve’s example. This ultimately has to do with integer division. If you run the following code, you’ll still get 2 as the result:

SELECT CAST(5.99 / 2) AS INT;

This is because SQL Server discards the decimal during integer casting. DATEADD() simply works with the end result, post-cast.

Comments closed

String Data Types in MySQL and PostgreSQL

Aisha Bukar compares two products:

A very common task in creating a database is to store string data. For example, words, paragraph(s) or even documents. String data types allow you to do just that and store and represent text. They handle everything from simple names and addresses to complex data.

A string is simply a sequence of characters. These characters can be letters, numbers, symbols, or even spaces. For example, “Simple Talk”, “MySQL and PostgreSQL”, “1234” are all strings. Think of each character as a building block. A string is made up of these blocks, arranged in a specific order.

As always, when dealing with different data platform technologies, the small differences are big.

Comments closed

SQL Server Views and Implicit Data Types

Kendra Little takes a peek at a view:

Views let you do dumb things by accident in SQL Server. Then they make you have to think way too hard to fix them.

Most of the time when people create views, they start by refining a SELECT query, then turn it into a view. People also often create multiple views that pull different slices of data and UNION the results together.

Combined, these two things easily lead to undeclared datatypes in views with problematic implicit conversions.

Read on for an example of this problem in action. Kendra’s example involved a view and a separate table, but you can also see this kind of thing pop up in a view that itself contains set operators like UNION.

Comments closed