Press "Enter" to skip to content

Category: Data Types

Expanding INT to BIGINT Without (Much) Downtime

Michael J. Swart finds a creative way around a problem, and then a creative way around the problem it causes, and so on:

The table has over 2 billion rows and it looks like it’s going to run out of space soon because the LogId column is defined as an INT. I need to change this table so that it’s a BIGINT. But changing an INT to a BIGINT is known as a “size of data” operation. This means SQL Server has to process every row to expand the LogId column from 4 to 8 bytes. But it gets trickier than that.

The biggest challenge is that the table has to remain “online” (available for queries and inserts).

Gianluca Sartori (spaghettidba) had the idea of enlarging the columns with no downtime using compression. It’s promising, but I discovered that for this to work, all indexes need to be compressed not just the ones that contain the changed column. Also, any indexes which use the column need to be disabled for this to work.

I gave up on solving this problem in general and constrained my focus to the specific problem I was facing. There’s always some context that lets us bend the rules. In my case, here’s what I did.

Read the whole thing. Also, as soon as I saw Michael’s title, I immediately thought of this:

Leave a Comment

NVARCHAR Everywhere

I get to put on my contrarian hat:

In the last episode of Shop Talk, I laid out an opinion which was…not well received. So I wanted to take some time and walk through my thinking a little more cogently than I was able to do during Shop Talk.

Here’s the short version. When you create a table and need a string column, you have a couple options available: VARCHAR and NVARCHAR. Let’s say that you’re a developer creating a table to store this string data. Do you choose VARCHAR or NVARCHAR? The classic answer is, “It depends.” And so I talk about why that is in video format right below these words.

I have a video which goes into detail, plus a bunch of words. Plus mice and banjos. 🐭🪕

Leave a Comment


Randolph West digs into what a UNIQUEIDENTIFIER looks like in storage:

Let’s take our example GUID again: CC05E271-BACF-4472-901C-957568484405. If we look at the table storage for this row, we’ll find it persisted as follows: 0x71E205CCCFBA7244901C957568484405 (alternating octets are highlighted in bold).

If you haven’t been following this series, this is a good place to remind you that SQL Server stores data using little-endian sequencing on disk and in memory. In the vast majority of cases, bytes are stored in reverse order because that’s how Intel CPUs like their data. However GUIDs are persisted slightly differently because of their sort order.

This is probably the most GUIDs I’ve seen in a single blog post.

Leave a Comment

How SQL Server Stores Floating Point Types

Randolph West continues a series on SQL Server data type storage:

If an integer or decimal amount is a precise representation of a value, a floating point is the closest approximation of that value in binary. Programming languages and databases use floating point numbers to trade storage (and memory) costs against precision. A floating point value is imprecise, but even that is underselling the problem.

Randolph also breaks all of the rules and writes out the largest FLOAT value you can have.

Leave a Comment

Proper Ways to Store Currency Data in SQL Server

Randolph West thinks about ways to store money values in SQL Server:

I completely agree with this statement. Never store values used in financial calculations as floating point values, because a floating point is an approximate representation of a decimal value, stored as binary. In most cases it is inaccurate as soon as you store it. You can read more in this excellent — if a little dry — technical paper.

With that out of the way, we get into an interesting discussion about the correct data type to store currency values.

Randolph states an argument around why DECIMAL(19,4) is fine. And it’s great for most cases, though the one “real” financial system I’ve worked with have money stored as integer types (with SQL Server, that’d be BIGINT) because of precision, especially when working with exchange rates. But for most cases—especially when you’re not building the system of record for financial transactions or accounts—I agree with Randolph that DECIMAL is fine. Dave Wentzel has a great comment explaining even further the reasoning behind integer values for certain monetary columns.

Comments closed

How SQL Server Stores the Money Data Type

Randolph West wants to talk to us about how we use our MONEY:

While MONEY and DECIMAL(19,4) are functionally the same, they are stored differently on disk, and this is where it gets interesting. Let’s use a random amount of $ 4,513.19. Since it’s small enough to fit in both MONEY and SMALLMONEY, we can do a simple experiment. When we ask SQL Server to store this value in a MONEY data type, it will store it (byte-reversed) as 0x7CA8B00200000000. The SMALLMONEY version of this amount would look almost identical, stored as 0x7CA8B002 (without the leading zeroes). A quick look at this byte-reversed value (0x02B0A87C) in a hex calculator gives us the amount of 45,131,900. After moving the decimal point four places to the left, we get our starting value of 4513.1900.

But read on to see how that compares to other data types.

Comments closed

XML Storage in SQL Server

Vladimir Klimov walks us through what happens when we save data to an XML data type:

When working on the release of dbForge Transaction Log, among other tasks, our team had to puzzle out how to properly store typed XML data.

To start with, it is worth mentioning that SQL Server does not store XML in the format it was entered. An XML string is parsed, split to tags, and thus is stored in a compressed format. Description elements that the server considers unnecessary are discarded.

It also should be kept in mind that, if the data type of a column is specified as simple XML, the server will store this data as Unicode strings.

Click through for a couple of examples and their data storage requirements.

Comments closed

The Limitations of Metadata-Only Updates

Eitan Blumin does not like the limitations of metadata-only column changes with SQL Server 2016:

This is an excellent mechanism on the one hand…

However, it’s completely useless when the column you want to change has a CLUSTERED INDEX defined on it (regardless of whether it’s also a PRIMARY KEY or not).

Such a scenario would especially be common with IDENTITY columns (which, ironically, is exactly the kind of examples that Paul presented in his post).

Click through to understand the scope of this limitation.

Comments closed

Detecting and Changing Multiple Columns’ Data Types with Power Query

Imke Feldmann has a quick tip for us:

Very often, when you expand a column in Power Query that contains a table or some records, the expanded columns will lose their types (like Chris Webb has described here for example). Or you might just have accidently deleted a “Changed Type”-step.

Did you know there is actually a superfast and easy way to do it?

Read on to see how you can change the data type for multiple columns all at once.

Comments closed

Date and Time Storage in SQL Server

Randolph West covers the internals of how date and time data types are stored in SQL Server:

DATE is the byte-reversed number of days since the year 0001-01-01, stored as three bytes. It goes up to 9999-12-31, which is stored as 0xDAB937. You can check this value by reversing the bytes and sticking them into a hex calculator. 37 B9 DA equals 3,652,058, which is the number of days since 0001-01-01.

If you try to cast 0xDBB937 as a DATE value (by incrementing the least significant bit DA by 1), it will throw a conversion error. There is obviously some overflow detection that protects against corruption in a date type.

Randolph looks at DATE, TIME, DATETIME(2), and DATETIME and explains how each is storedon a page.

Comments closed