Finding Partition Boundaries

Kenneth Fisher shows how to find the min and max values for a partition:

So what does it do? Per BOL

Returns the partition number into which a set of partitioning column values would be mapped for any specified partition function in SQL Server 2016.

So it basically tells us which partition any given row is in. This can be particularly handy at times. For example, if you want to know the min and max values of a column per partition.

Read on for a couple scripts which use $Partition.

Partitioning Nullable Columns

Kenneth Fisher looks at what happens when you use a nullable column as a partition key:

So to start with how does partitioning handle a NULL? If you look in the BOL for the CREATE PARTITION FUNCTION you’ll see the following:

Any rows whose partitioning column has null values are placed in the left-most partition unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.

So basically NULLs are going to end up in the left most partition(#1) unless you specifically make a partition for NULL and are using a RIGHT partition. So let’s start with a quick example of where NULL values are going to end up in a partitioned table (a simple version).

Click through to see Kenneth’s proof and the repercussions of making that partitioning column nullable.

Partitioned Columnstore Tables

Denny Cherry makes an important point about dealing with columnstore tables:

ColumnStore indexes are all the rage with data warehouses. They’re fast, they’re new(ish) and they solve all sorts of problems when dealing with massive amounts of data.  However they can cause some issues as well if you aren’t very careful about how you setup your partitions on the ColumnStore index.  This is because, you can’t split a ColumnStore partition once it contains data.

Now, if everything is going according to plan you create your partitions well in advance and there’s no issues.

However, if everything hasn’t gone according to plan and someone forgets to create the partitions and you end up with rows in the final partition, you can’t create any more partitions because you can’t split the partition.

Ideally, you get those ducks in a row first.  Keep reading for a repro script and a couple potential workarounds.

Columnstore Partitioning In SQL Server 2016

Niko Neugebauer demonstrates some performance improvements to partitioned columnstore indexes in SQL Server 2016:

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.

Partitioning Basics

Kim Tripp explains partitioning:

So, you’d still need to determine if this is the right approach. But, the main point – partitioning really isn’t designed to give incredible gains to your queries. It’s meant to be better for data management and maintenance. However, some partitioning designs can lead to query performance benefits too.

This is a nice introduction and makes a good point:  performance benefits to partitioning are incidental to the real benefit, which is simplicity of administration.

Viewing Partitions

Kendra Little has a query to view partitions:

This helps make sure that you’re designing your tables correctly, and it also helps you avoid goofs like merging the wrong boundary point and causing a bunch of data to move into another– which can be slow and painful.

All this information is available in TSQL, it’s just an ugly query, and it doesn’t come in any built-in reports or views.

So I’ve got an ugly query for you!

Having a script like this is very helpful if you use partitioning for anything.

Switching Instead Of Renaming Tables

Kendra Little has an interesting solution to when you need to swap out an old table for a new version:

This pattern works in SQL Server 2014 and higher. And it even works in Standard Edition of 2014.

Some folks will see the word ‘Switch’ in this pattern and assume the pattern that I’m suggesting is Enterprise Edition only for versions before SQL Server 2016 SP1.

However, oddly enough, you can use partition switching even in Standard Edition, as long as the tables only have one partition.

And all rowstore tables have at least one partition! That happens automagically when you create a table.

Read the whole thing.

Deleting SSAS Partitions

Chris Koester shows how to use TMSL and Powershell to delete an Analysis Services tabular model partition:

The sample script below shows how this is done. The sequence command is used to delete multiple partitions in a single transaction. This is similar to the batch command in XMLA. In this example we’re only performing delete operations, but many different operations can be performed in sequence (And some in parallel).

Click through for a description of the process as well as a script to do the job.

Columnstore Partitioning

Niko Neugebauer warns against partitioning small tables with clustered columnstore indexes:

Needless to say that looking at the execution plans you notice that the actual execution plan shows 10 times difference between them, even though both tables contain the very same data!
The query cost for the partitioned table is staggering – it is around 10 times bigger (~8.8) vs (~0.81) for the first query.
The execution times reflect in part this situation: 12 ms vs 91 ms. Non-partitioned table performs almost 9 times faster overall and the spent CPU time is reflecting it: 15 ms vs 94 ms. Remember, that both tables are Columnstore Indexes based ! Partitioning your table in a wrong way will contain a huge penalty that might not be directly detectable through the execution plan of the complex queries. Well, you might want to use the CISL, just saying

If you can’t fill a single rowgroup, your partition is too granular.  Even then, I’d like to see double-digit rowgroups per partition, though that’s just me.

Partitioned Views With Polybase

I look at using SQL 2000-style partitioning with a set of external tables:

Using a view, we were able to create a “partitioned” Polybase experience, similar to what we had in SQL Server 2000.  This form of poor man’s partitioning allows us to segment out data sets and query them independently, something which can be helpful when storing very large amounts of data off-site and only occasionally needing to query it.  The thing to remember, though, is that if you store this in Azure Blob Storage, you will need to pull down the entire table’s worth of data to do any processing.

This leads to a concept I first heard from Ginger Grant:  pseudo-StretchDB.  Instead of paying for what Stretch offers, you get an important subset of the functionality at a much, much lower price.  If you do store the data in Azure Blob Storage, you’re paying pennies per gigabyte per month.  For cold storage, like a scenario in which you need to keep data around to keep the auditors happy but your main application doesn’t use that information, it can work fine.  But if you need to query this data frequently, performance might be a killer.

For Polybase tables without the ability to perform external pushdown, coming up with a good partitioning strategy is probably one of the two best ways to improve performance, with creating a Polybase scale-out cluster the other method.

Categories

May 2017
MTWTFSS
« Apr  
1234567
891011121314
15161718192021
22232425262728
293031