Now that I rambled a bit you want to know why when using a partitioned table does grabbing the min and max of the primary key take sooooo long, and how do you fix it. Theoretically you would expect SQL to perform the following steps in grabbing the Max Id
- Grab the Max Id from each partition using a seek
- Hold the results in temp storage
- Get the Max ID from the temp storage, and return that result.
However SQL doesn’t do that, it actually scans each partition and finds the max id after it has examined every record in each partition. This is very inefficient, and could kill a query that depends on this value, as well as impact a busy server low on physical resources. So what we need to do, is manually write the code to perform the steps that SQL Server should actually be doing.
Read on for one workaround Ken uses to deal with this inefficiency.
I needed to query SQL Servers metadata about partitioned tables, especially the column and the partition scheme used partitioning. The former is quite nicely documented in the SQL Server documentation (see link below), but the latter is not (yet). I have written the team about this, hopefully the documentation will be updated. Until then, I wrote this blog post to help others searching for an answer to this.
Click through for the script.
A friend had an interesting problem today. A really big table (multiple millions of rows) and no primary key. He then ran into an issue where he had to have one. The easiest thing is to create a new int column with an identity column right? Unfortunately in this case because of the size of the table, the log growth on adding an identity column was too much. So what to do?
Well, it would be nice if we could add an int column, populate it in chunks, then make it an identity column. Unfortunately, you can’t add identity to an existing column.
Read on for the answer.
So what does it do? Per BOL
Returns the partition number into which a set of partitioning column values would be mapped for any specified partition function in SQL Server 2016.
So it basically tells us which partition any given row is in. This can be particularly handy at times. For example, if you want to know the min and max values of a column per partition.
Read on for a couple scripts which use $Partition.
So to start with how does partitioning handle a NULL? If you look in the BOL for the CREATE PARTITION FUNCTION you’ll see the following:
Any rows whose partitioning column has null values are placed in the left-most partition unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.
So basically NULLs are going to end up in the left most partition(#1) unless you specifically make a partition for NULL and are using a RIGHT partition. So let’s start with a quick example of where NULL values are going to end up in a partitioned table (a simple version).
Click through to see Kenneth’s proof and the repercussions of making that partitioning column nullable.
ColumnStore indexes are all the rage with data warehouses. They’re fast, they’re new(ish) and they solve all sorts of problems when dealing with massive amounts of data. However they can cause some issues as well if you aren’t very careful about how you setup your partitions on the ColumnStore index. This is because, you can’t split a ColumnStore partition once it contains data.
Now, if everything is going according to plan you create your partitions well in advance and there’s no issues.
However, if everything hasn’t gone according to plan and someone forgets to create the partitions and you end up with rows in the final partition, you can’t create any more partitions because you can’t split the partition.
Ideally, you get those ducks in a row first. Keep reading for a repro script and a couple potential workarounds.
266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement
All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:
These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.
So, you’d still need to determine if this is the right approach. But, the main point – partitioning really isn’t designed to give incredible gains to your queries. It’s meant to be better for data management and maintenance. However, some partitioning designs can lead to query performance benefits too.
This is a nice introduction and makes a good point: performance benefits to partitioning are incidental to the real benefit, which is simplicity of administration.
This helps make sure that you’re designing your tables correctly, and it also helps you avoid goofs like merging the wrong boundary point and causing a bunch of data to move into another– which can be slow and painful.
All this information is available in TSQL, it’s just an ugly query, and it doesn’t come in any built-in reports or views.
So I’ve got an ugly query for you!
Having a script like this is very helpful if you use partitioning for anything.
This pattern works in SQL Server 2014 and higher. And it even works in Standard Edition of 2014.
Some folks will see the word ‘Switch’ in this pattern and assume the pattern that I’m suggesting is Enterprise Edition only for versions before SQL Server 2016 SP1.
However, oddly enough, you can use partition switching even in Standard Edition, as long as the tables only have one partition.
And all rowstore tables have at least one partition! That happens automagically when you create a table.
Read the whole thing.