Press "Enter" to skip to content

Category: Data Types

Performance Impacts of Computed Columns

Robert Sheldon takes us through a few scenarios:

In this article, I walk you through the process of applying these strategies so you have a better sense of the available options. For the examples, I created four similar tables and populated them with identical data, which comes from the WideWorldImporters sample database. Each table includes the same computed column, with the column persisted in two tables and indexed in two tables, resulting in the following mix:

– The Orders1 table includes a non-persisted computed column.
– The Orders2 table includes a persisted computed column.
– The Orders3 table includes an indexed, non-persisted computed column.
– The Orders4 table includes an indexed, persisted computed column.

For each table, I show you the execution plan that’s generated when querying the computed column. The column’s expression is a relatively simple one, and the data set very small. Even so, this should be enough to demonstrate the principles of creating persistent and indexed computed columns and how they can help address performance-related issues.

Click through for Robert’s results.

Comments closed

How SQL Server Stores XML Data

Randolph West continues a series on internal storage of data types:

The SQL Server XML data type supports both schema-defined and schema-less XML documents (typed and untyped), and partial documents. Partial documents are still considered well-defined XML, but don’t have surrounding <?xml>...</xml> tags for example. Either way, the storage engine validates the data when inserting or updating into an XML column.

The XML data type lets us store up to 2GB per column per row, but it’s prudent here to note that just because we can, it doesn’t mean we should.

Click through for the details.

Comments closed

DATETIME2 and Storage Size

Randolph West digs into an issue:

Two years ago I wrote a post that got a lot of traction in the comments at the time. Last month there was renewed interest because one of the commenters noted that the official SQL Server documentation for DATETIME2 disagreed with my assertions, and that I was under-representing the storage requirements.

To remind you, I have been saying for years that you can use DATETIME2(3) as a drop-in replacement for DATETIME, and have better granularity (1ms versus 3ms) for 12.5% less storage (1 byte per column per row). The commenter intimated that because my statement conflicted with the documentation, that I must be wrong. As it turns out the documentation was wrong, but I also learned something new in the process!

It’s an interesting internal look at how difficult it is to get documentation right, even on something which sounds simple.

Comments closed

When Altering a Table Blocks the Log Reader

Aaron Bertrand walks us through a painful scenario:

We recently performed a DDL operation against a SQL Server table – simply increasing the size of a varchar column – which should have been instantaneous. Instead, we killed the SQL Server process after observing 20 minutes of HARD_SYNC_COMMIT waits and a blocked replication log reader. Could this issue have been avoided? What went wrong?

I spotted the issue pretty quickly, but it’s easy to miss in a code review. Read the whole thing.

Comments closed

When Date Tables Go Bad

Brent Ozar walks through a scenario in which a calendar table (AKA, date dimension) makes a query perform quite a bit worse:

So why did the date table not perform as well as the old-school way?

SQL Server doesn’t understand the relationship between these two tables. It simply doesn’t know that all of the rows in the Users table will match up with rows in the calendar table. It assumes that the calendar table is doing some kind of filtering, especially given our CAST on the date. It doesn’t expect all of the rows to match.

My reaction was pretty much the same as Koen Verbeeck’s in the comments. Put in clearer terms, calendar tables work best when you’re joining a DATE type to a DATE type. Once you introduce times into the mix, the optimizer has to behave differently, not least because you have to do things like CAST() to coerce data types.

Comments closed

When NULL Makes Sense

Chris Johnson shares some rules of thumb on when allowing a column to be nullable makes sense:

Now I want to talk a bit about when we should be using NULLs and when we shouldn’t, inspired again by some things I’ve seen in legacy code a few days ago. And when I say using, I mean when should we allow values to be NULL, in tables, and parameters, and anything else.

The first thing to remember is what does a NULL represent. It’s very simple, NULL means “I don’t know”. If you keep that in mind, and you’re practising some good development design where everything has a point, you should be able to tell if it’s possible to be unsure about any particular value for a row.

One of the tricky parts about NULL values and legacy code is that if I need to add a new column and there is not a good default, either I make the column nullable (regardless of whether it should be) or I take downtime by blocking table access until my change is in place. As a result, quite often, I simply need to make something nullable because I can’t afford to block the table that long.

Comments closed

Expanding INT to BIGINT Without (Much) Downtime

Michael J. Swart finds a creative way around a problem, and then a creative way around the problem it causes, and so on:

The table has over 2 billion rows and it looks like it’s going to run out of space soon because the LogId column is defined as an INT. I need to change this table so that it’s a BIGINT. But changing an INT to a BIGINT is known as a “size of data” operation. This means SQL Server has to process every row to expand the LogId column from 4 to 8 bytes. But it gets trickier than that.

The biggest challenge is that the table has to remain “online” (available for queries and inserts).

Compression?
Gianluca Sartori (spaghettidba) had the idea of enlarging the columns with no downtime using compression. It’s promising, but I discovered that for this to work, all indexes need to be compressed not just the ones that contain the changed column. Also, any indexes which use the column need to be disabled for this to work.

Cheating
I gave up on solving this problem in general and constrained my focus to the specific problem I was facing. There’s always some context that lets us bend the rules. In my case, here’s what I did.

Read the whole thing. Also, as soon as I saw Michael’s title, I immediately thought of this:

Comments closed

NVARCHAR Everywhere

I get to put on my contrarian hat:

In the last episode of Shop Talk, I laid out an opinion which was…not well received. So I wanted to take some time and walk through my thinking a little more cogently than I was able to do during Shop Talk.

Here’s the short version. When you create a table and need a string column, you have a couple options available: VARCHAR and NVARCHAR. Let’s say that you’re a developer creating a table to store this string data. Do you choose VARCHAR or NVARCHAR? The classic answer is, “It depends.” And so I talk about why that is in video format right below these words.

I have a video which goes into detail, plus a bunch of words. Plus mice and banjos. 🐭🪕

Comments closed

How SQL Server Stores UNIQUEIDENTIFIERS

Randolph West digs into what a UNIQUEIDENTIFIER looks like in storage:

Let’s take our example GUID again: CC05E271-BACF-4472-901C-957568484405. If we look at the table storage for this row, we’ll find it persisted as follows: 0x71E205CCCFBA7244901C957568484405 (alternating octets are highlighted in bold).

If you haven’t been following this series, this is a good place to remind you that SQL Server stores data using little-endian sequencing on disk and in memory. In the vast majority of cases, bytes are stored in reverse order because that’s how Intel CPUs like their data. However GUIDs are persisted slightly differently because of their sort order.

This is probably the most GUIDs I’ve seen in a single blog post.

Comments closed

How SQL Server Stores Floating Point Types

Randolph West continues a series on SQL Server data type storage:

If an integer or decimal amount is a precise representation of a value, a floating point is the closest approximation of that value in binary. Programming languages and databases use floating point numbers to trade storage (and memory) costs against precision. A floating point value is imprecise, but even that is underselling the problem.

Randolph also breaks all of the rules and writes out the largest FLOAT value you can have.

Comments closed