I recently received a terrific question about table partitioning:
I’m writing up a proposal for my company to start partitioning a 2.5TB table. The idea/need is to control the size of the table while saving the old data. The old data will be moved to an archive database on a different server where the BI guys work with it.
In none of the videos articles I’ve seen is the explanation of how the rolling partition works on a long term daily basis.
- Are the partitions reused, like in a ROUND ROBIN fashion?
- Or, do you add new partitions each day with new filegroups, drop the oldest partition off – this would be FIFO?
Lots of folks assume the answer here is always #2, simply because there’s a bunch of sample code out there for it.
But option #1 can be simpler to manage when it fits your data retention technique!
Click through to learn more about reusable partitioning.
When working with partitioning the SWITCH operation has to be my favourite. The ability to move a large amount of data from one table to another as a META DATA ONLY operation is absolutely fantastic.
What’s also cool is that we can switch data into a non-partitioned table. Makes life a bit easier not having to manage two sets of partitions!
However, there is a bit of a gotcha when doing this. Let’s run through a quick demo.
Read on for more.
Looking at that data page, we can see that SQL has added a UNIQUIFIER column. Now this is standard SQL behaviour, SQL does this to all non-unique clustered indexes whether they are on a partitioned table or not.
But also look at the CreatedDate column. It’s after the ID column on the page. If this was a non-partitioned table, we would see that after ColA & ColB (the order the columns are on the table). This has happened because SQL has implicitly added the partitioning key into the index definition, which has changed the physical order of the data on the page.
Read the whole thing.
SQL Server has a feature for partitioning tables and indexes. Partitioning can be implemented at many levels, however. You can create multiple tables for one logical data set, you can split the set into multiple databases, and you can even split it among different servers. Sharding is another term. It refers to partitioning data to horizontally scale out compute resources and storage.
There are different methods of handling sharding. Each of them need a central point of connection to handle querying the data on the other shards. This is typically called the control node. The method I am to discuss today is one where linked servers are used to connect the various shards.
This is useful for something like offloading old invoices which you rarely need to a separate server. Derik also shows that the optimizer can, if it knows your partitioning rules, avoid unnecessary cross-server hits.
So, let’s pretend it’s the month of April 2017 and this is the partition currently populated. Based on the query above, aside from the current partition bucket, we also have another available bucket month for May.
Say we want to maintain 3 available buckets at any given time. The next available bucket is May, so that means we need 2 more partitions to cover for June and July.
Read on for more, including some scripts that you can automate.
In the Chicago perf class last month, we had a student ask if partition level locks would ever escalate to a table level lock. I wrote up a demo and everything, but we ran out of time before I could go over it.
Not that I’m complaining — partitioning, and especially partition level locking, can be pretty confusing to look at.
Click through for that demo and explanation.
The common, by the book approach recommends dropping columnstore index, splitting or merging partitions and recreating the index afterwards. As you can imagine, it would lead to extremely inefficient process with huge amount of unnecessary overhead on large tables. After all, you have to drop and recreate columnstore index, converting table to Heap, while just subset of the partitions needs to be rebuilt. Fortunately, you can minimize the overhead with simple workaround:
Switch partition(s) to split or merge to the separate staging table
Split or merge partition(s) in the main table. You can do that because partitions will be empty after the previous step
Drop columnstore index in the staging table, split/merge partition(s) there and recreate the index afterwards
Switch partition(s) back from staging to the main table.
Read on for a detailed walkthrough of these steps.
The biggest difference resides in partition function split and merge behavior. With B-Tree indexes, you can split and merge non-empty partitions. SQL Server would split or merge the data automatically, granted with the schema-modification (Sch-M) table lock held in place. Other sessions would be unable to access the table but at least split and merge would work.
This is not the case with columnstore indexes where you would get the error when you try to split or merge non-empty partitions. There are several reasons for this limitation. Without diving very deep into columnstore index internals, I could say that the only option of doing split of merge internally is rebuilding columnstore index on affected partitions. Fortunately, you can split and merge empty columnstore partitions, which allow you to workaround the limitation and also implement Sliding Window pattern and use partitioning to purge the data.
With SQL Server 2017, the logic gets a little simpler, as you can directly truncate partitions instead of shuffling them off to a separate table.
When it comes to Microsoft SQL Server things are a bit different as this database system does not support dynamic partitions and so partitioning the table manually can be a huge maintenance issue.
That being said, in order to create a partitioned table a similar procedure to the one previously presented must be followed. This time we will create a monthly partition.
Read on for scripts for each.
Now that I rambled a bit you want to know why when using a partitioned table does grabbing the min and max of the primary key take sooooo long, and how do you fix it. Theoretically you would expect SQL to perform the following steps in grabbing the Max Id
- Grab the Max Id from each partition using a seek
- Hold the results in temp storage
- Get the Max ID from the temp storage, and return that result.
However SQL doesn’t do that, it actually scans each partition and finds the max id after it has examined every record in each partition. This is very inefficient, and could kill a query that depends on this value, as well as impact a busy server low on physical resources. So what we need to do, is manually write the code to perform the steps that SQL Server should actually be doing.
Read on for one workaround Ken uses to deal with this inefficiency.