Partitioning Nullable Columns

Kenneth Fisher looks at what happens when you use a nullable column as a partition key:

So to start with how does partitioning handle a NULL? If you look in the BOL for the CREATE PARTITION FUNCTION you’ll see the following:

Any rows whose partitioning column has null values are placed in the left-most partition unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.

So basically NULLs are going to end up in the left most partition(#1) unless you specifically make a partition for NULL and are using a RIGHT partition. So let’s start with a quick example of where NULL values are going to end up in a partitioned table (a simple version).

Click through to see Kenneth’s proof and the repercussions of making that partitioning column nullable.

Benefits Of Deprecated Data Types

Raul Gonzalez shows how to get one of the benefits of older, deprecated data types using (MAX) data types:

We can see that our table is managed by two different allocation units, IN_ROW_DATA and LOB_DATA, which means that all data within columns of the data types above, will end up in different pages by default, regardless of the size of the data.

This is the default behaviour for old LOB types, to be stored separately, but new LOB types (MAX) by default will try to get them In-Row if they are small enough to fit.

Having some of those documents In-Row will result in a serious increase in the number of pages to scan, therefore affecting performance.

Note that for the table scan we have used only the IN_ROW_DATA pages, making it much lighter than if we have to scan the sum of all pages.

This might be helpful for some situations, like where you rarely need to get to the LOB data.

Data Type Mismatches

Kendra Little gets into why certain data type mismatches force scans of tables while others can still allow seeks:

Sometimes we get lucky comparing a literal value to a column of a different type.

But this is very complicated, and joining on two columns of different types in the same family without explicitly converting the type of one of the columns resulted in worse performance in Paul White’s tests, when the columns allowed NULLs! (Note: I haven’t rerun those tests on 2016, but I think the general advice below still applies.)

General advice: don’t rely on being lucky. Pay attention to your data types, and compare values of the same data type wherever possible.

That’s great advice.

Inconsistencies With SQL_VARIANT

Erik Darling warns against using SQL_VARIANT data types:

I half-stumbled on the weirdness around SQL_VARIANT a while back while writing another post about implicit conversion. What I didn’t get into at the time is that it can give you incorrect results.

When I see people using SQL_VARIANT, it’s often in dynamic SQL, when they don’t know what someone will pass in, or what they’ll compare it to. One issue with that is you’ll have to enclose most things in single quotes, in case a string or date is passed in. Ever try to hand SQL those without quotes? Bad news bears. You don’t get very far.

Read on for the demo.  I have never used SQL_VARIANT in any project.  I’ve done a lot of crazy things with SQL Server (some of them intentionally) but never this.

SSRS Category Charts & Ints

Kathi Kellenberger notices an oddity with SSRS Mobile Report category charts:

Notice that OrderYear displays decimal points. I switched the dataset in the Series field name property, and found that neither of the columns in the dataset can be used.

Numeric columns cannot be set as a Series name field. To work around this, I modified the dataset, casting OrderYear as a CHAR(4).

That’s not a great situation, but at least there’s a workaround.

Which Data Types Can Create Statistics?

Raul Gonzalez figures out which data types cannot be part of statistics:

Yeah, there you go, all these _WA_Sys_ stats tell me they have been automatically created (there is a flag in sys.stats if you don’t believe me) but I can see there are only 31, where I created 34 columns.

That’s funny, let’s see which data types did get statistics.

The results are pretty interesting.

Wanted: Unsigned Ints

Ewald Cress would like to have unsigned integer types:

Let’s take a look at what is being asked here. Using the 32-bit integer as an example, we currently have a data type that can accept a range between negative two billion and two billion. But if negative numbers aren’t required, we can use those same 32 bits to store numbers between zero and four billion. Why, goes the question, throw away that perfectly useful upper range by always reserving space for a negative range we may not need?

I appreciate Ewald’s thoughtfulness here in working out the value of the request as well as some of the difficulties in building something which fulfills his desire.  Great read.

Uses For Binary Data Types

Daniel Hutmacher explains what binary data types are and one use case:

Because binary values are essentially strings, they easily convert to and from character strings, using CAST or CONVERT. To convert the binary value of 0x41 to a plain-text character value, try

SELECT CAST(0x41 AS char(1)); --- 'A'

The binary value 0x41 is equivalent to decimal 65, and CHAR(65) is the letter “A”. Note that I haven’t placed any quotes around 0x41 – that’s because it’s a numeric value (albeit in hex notation) and not a string.

A couple use cases I’ve seen are creating hashes (SHA1 or MD5) for change detection, storing password hashes, and encrypted columns—Always Encrypted uses varbinary data types to store encrypted information, for example.

Optimizing Large Documents For Space

Raul Gonzalez drops a 2 TB table’s size in half:

So at work, I’d say space matters, and in order to optimize our storage requirements it’s very important to know about SQL Server internals, specially the Storage Engine, which happens to be one of my favorite topics of study.

In my quest to release some space I got to this database, just one table which is 165M of XML documents stored as NVARCHAR(MAX).

It was interesting walking through the process.  Some part of me wonders if it’s a bit complex for the next maintainer to handle, but saving a terabyte of disk space is worth a few extra pages of documentation…

Max Data Types In Queries

Erik Darling shows how variable definition can be important, even without implicit conversion:

SQL Server makes many good and successful attempts at something called predicate pushdown, or predicate pushing. This is where certain filter conditions are applied directly to the data access operation. It can sometimes prevent reading all the rows in a table, depending on index structure and if you’re searching on an equality vs. a range, or something else.

What it’s really good for is limiting data movement. When rows are filtered at access time, you avoid needing to pass them all to a separate operator in order to reduce them to the rows you’re actually interested in. Fun for you! Be extra cautious of filter operators happening really late in your execution plans.

Click through for Erik’s demo.


May 2017
« Apr