Press "Enter" to skip to content

Category: Data Types

Wanted: Unsigned Ints

Ewald Cress would like to have unsigned integer types:

Let’s take a look at what is being asked here. Using the 32-bit integer as an example, we currently have a data type that can accept a range between negative two billion and two billion. But if negative numbers aren’t required, we can use those same 32 bits to store numbers between zero and four billion. Why, goes the question, throw away that perfectly useful upper range by always reserving space for a negative range we may not need?

I appreciate Ewald’s thoughtfulness here in working out the value of the request as well as some of the difficulties in building something which fulfills his desire.  Great read.

Comments closed

Uses For Binary Data Types

Daniel Hutmacher explains what binary data types are and one use case:

Because binary values are essentially strings, they easily convert to and from character strings, using CAST or CONVERT. To convert the binary value of 0x41 to a plain-text character value, try

SELECT CAST(0x41 AS char(1)); --- 'A'

The binary value 0x41 is equivalent to decimal 65, and CHAR(65) is the letter “A”. Note that I haven’t placed any quotes around 0x41 – that’s because it’s a numeric value (albeit in hex notation) and not a string.

A couple use cases I’ve seen are creating hashes (SHA1 or MD5) for change detection, storing password hashes, and encrypted columns—Always Encrypted uses varbinary data types to store encrypted information, for example.

Comments closed

Optimizing Large Documents For Space

Raul Gonzalez drops a 2 TB table’s size in half:

So at work, I’d say space matters, and in order to optimize our storage requirements it’s very important to know about SQL Server internals, specially the Storage Engine, which happens to be one of my favorite topics of study.

In my quest to release some space I got to this database, just one table which is 165M of XML documents stored as NVARCHAR(MAX).

It was interesting walking through the process.  Some part of me wonders if it’s a bit complex for the next maintainer to handle, but saving a terabyte of disk space is worth a few extra pages of documentation…

Comments closed

Max Data Types In Queries

Erik Darling shows how variable definition can be important, even without implicit conversion:

SQL Server makes many good and successful attempts at something called predicate pushdown, or predicate pushing. This is where certain filter conditions are applied directly to the data access operation. It can sometimes prevent reading all the rows in a table, depending on index structure and if you’re searching on an equality vs. a range, or something else.

What it’s really good for is limiting data movement. When rows are filtered at access time, you avoid needing to pass them all to a separate operator in order to reduce them to the rows you’re actually interested in. Fun for you! Be extra cautious of filter operators happening really late in your execution plans.

Click through for Erik’s demo.

Comments closed

LOB On Memory-Optimized Tables

Dmitri Korotkevitch digs into LOB data when you build a memory-optimized table:

There is also considerable overhead in terms of memory usage. Every non-empty off-row value adds 50+ bytes of the overhead regardless of its size. Those 50+ bytes consist of three artificial ID values (in-row, off-row in data row and leaf-level of the range index) and off-row data row structure. It is even larger in case of LOB columns where data is stored in LOB Page Allocator.

One of the key points to remember that decision which columns go off-row is made based on the table schema. This is very different from on-disk tables, where such decision is made on per-row basis and depends on the data row size. With on-disk tables, data is stored in row when it fits on the data page.

In-Memory OLTP works in the different way. (Max) columns are always stored off-row. For other columns, if the data row size in the table definition can exceed 8,060 bytes, SQL Server pushes largest variable-length column(s) off-row. Again, it does not depend on amount of the data you store there.

This is a great article getting into the internals of how memory-optimized tables work in SQL Server 2016, as well as a solid reason to avoid LOB types and and very large VARCHAR values on memory-optimized tables if you can.  Absolutely worth a read.

Comments closed

Int To BigInt

Kendra Little walks through the process of expanding an int column into a bigint:

Sometimes you just can’t take the outage. In that case, you’ve got to proceed with your own wits, and your own code. This is tricky because changes are occurring to the table.

The solution typically looks like this:

  • Set up a way to track changes to the table – either triggers that duplicate off modifications or Change Data Capture (Enterprise Edition)

  • Create the new table with the new data type, set identity_insert on if needed

  • Insert data into the new table. This is typically done in small batches, so that you don’t overwhelm the log or impact performance too much. You may use a snapshot from the point at which you started tracking changes.

  • Start applying changed data to the new table

  • Make sure you’re cleaning up from the changed data you’re catching and not running out of space

  • Write scripts to compare data between the old and new tables to make sure you’re really in sync (possibly use a snapshot or a restored backup to compare a still point in time)

  • Cut over in a quick downtime at some point using renames, schema transfer, etc. If it’s an identity column, don’t forget to fix that up properly.

This method matches what I’ve done in zero downtime situations.

Also see Aaron Bertrand’s article on the same topic:

In part 3 of this series, I showed two workarounds to avoid widening an IDENTITY column – one that simply buys you time, and another that abandons IDENTITY altogether. The former prevents you from having to deal with external dependencies such as foreign keys, but the latter still doesn’t address that issue. In this post, I wanted to detail the approach I would take if I absolutely needed to move to bigint, needed to minimize downtime, and had plenty of time for planning.

Because of all of the potential blockers and the need for minimal disruption, the approach might be seen as a little complex, and it only becomes more so if additional exotic features are being used (say, partitioning, In-Memory OLTP, or replication).

At a very high level, the approach is to create a set of shadow tables, where all the inserts are directed to a new copy of the table (with the larger data type), and the existence of the two sets of tables is as transparent as possible to the application and its users.

Those are two good posts on this topic.

Comments closed

Unicode

Aaron Bertrand discusses two Unicode schools of thought:

One of the more common dilemmas schema designers face, and I’m realizing it’s something that I should probably spend more time on in my presentations, is whether to use varchar or nvarchar for columns that will store string data. As part of my #EntryLevel challenge, I thought I’d start by writing a bit about this here.

In general, I come across two schools of thought on this:

  1. Use varchar unless you know you need to support Unicode.

  2. Use nvarchar unless you know you don’t.

My preference is to start with Unicode.  15 years ago, you could easily get away with using ASCII for most US-developed systems, but the likelihood that you will need to store data in multiple, varied languages is significantly higher today.  And having already refactored one application to support Unicode after it became fairly large, I’d rather not do that again…

Comments closed

Altering Columns In Large Tables

Kenneth Fisher discusses a problem he had with altering a column on a large table:

My first attempt was just a straight ALTER TABLE ALTER COLUMN. After about an hour I got back a log full error. I then tried a 200 GB log and a 350 GB log. These failed at 3 and 5 hours. While this was going on I checked on #sqlhelp to see if anyone knew any way to minimize the log useage so my command would finish.

The primary suggestions were:

  • Add a new column to the end of the table, populate it in batches, then remove the old column.
  • Create a new table, populate it, index it, drop the old table, re-name the new table.

I will say that I have used suggestion #1 several times, particularly in zero down-time situations.  Once you’re done backfilling the column, you can drop the old one and rename the new one in a single transaction.  Read on for Kenneth’s solution.

Comments closed

Casting Oracle Number To Numeric

Jon Morisi points out a rounding issue when casting Oracle’s Number data type to SQL Server’s Numeric:

Perhaps the 2014 SQL Server is implicitly converting to float, using the nearest even prior to the explicit cast to Numeric.  However, how the scale (number of decimal digits that will be stored to the right of the decimal point) would be determined in such a scenario is a conundrum.   Either way, although the mapping is defined the same, the behavior demonstrated between the two versions of SQL Server is inconsistent.
Research into ANSI and IEEE both boil down to truncation and/or rounding is implementation defined.

It’s an interesting issue.  Read on for more details.

Comments closed