Press "Enter" to skip to content

Category: Partitioning

Columnstore Partitioning

Niko Neugebauer warns against partitioning small tables with clustered columnstore indexes:

Needless to say that looking at the execution plans you notice that the actual execution plan shows 10 times difference between them, even though both tables contain the very same data!
The query cost for the partitioned table is staggering – it is around 10 times bigger (~8.8) vs (~0.81) for the first query.
The execution times reflect in part this situation: 12 ms vs 91 ms. Non-partitioned table performs almost 9 times faster overall and the spent CPU time is reflecting it: 15 ms vs 94 ms. Remember, that both tables are Columnstore Indexes based ! Partitioning your table in a wrong way will contain a huge penalty that might not be directly detectable through the execution plan of the complex queries. Well, you might want to use the CISL, just saying

If you can’t fill a single rowgroup, your partition is too granular.  Even then, I’d like to see double-digit rowgroups per partition, though that’s just me.

Comments closed

Partitioned Views With Polybase

I look at using SQL 2000-style partitioning with a set of external tables:

Using a view, we were able to create a “partitioned” Polybase experience, similar to what we had in SQL Server 2000.  This form of poor man’s partitioning allows us to segment out data sets and query them independently, something which can be helpful when storing very large amounts of data off-site and only occasionally needing to query it.  The thing to remember, though, is that if you store this in Azure Blob Storage, you will need to pull down the entire table’s worth of data to do any processing.

This leads to a concept I first heard from Ginger Grant:  pseudo-StretchDB.  Instead of paying for what Stretch offers, you get an important subset of the functionality at a much, much lower price.  If you do store the data in Azure Blob Storage, you’re paying pennies per gigabyte per month.  For cold storage, like a scenario in which you need to keep data around to keep the auditors happy but your main application doesn’t use that information, it can work fine.  But if you need to query this data frequently, performance might be a killer.

For Polybase tables without the ability to perform external pushdown, coming up with a good partitioning strategy is probably one of the two best ways to improve performance, with creating a Polybase scale-out cluster the other method.

Comments closed

Partition Handling In Spark 2.1

Eric Liang, et al, discuss a change to Spark 2.1 which will make certain partitioned table access faster:

In Spark 2.1, we drastically improve the initial latency of queries that touch a small fraction of table partitions. In some cases, queries that took tens of minutes on a fresh Spark cluster now execute in seconds. Our improvements cut down on table memory overheads, and make the SQL experience starting cold comparable to that on a “hot” cluster with table metadata fully cached in memory.

This looks like a nice improvement in Spark.

Comments closed

Quickly Reloading Tables

Kenneth Fisher uses table partitioning to perform fast loads of data:

Now if this table is paritioned you’d use SWITCH and bring in a new partition.

For those that don’t know, when a table is partitioned, you can create a new empty partition, and a new empty table, load the table, make the table exactly match the partition (structure, check constraints, & indexes for example) and you can SWITCH it in. The SWITCH part is a metadata operation and is fast!

But what do you do if the table isn’t partitioned? Well, I was having a conversation with Andy Mallon (b/t) and he reminded me of something.

Read on for the details.  The upshot is that you can take your time loading the second table and once you’re ready to swap out, it’s a quick metadata change.  That’s really useful for ETL scenarios.

Comments closed

Table Partitioning And Performance

Kendra Little has a video on table partitioning:

Learn why SQL Server’s table partitioning feature doesn’t make your queries faster– and may even make them slower.

In this 20 minute video, I’ll show you my favorite articles, bugs, and whitepapers online to explain where table partitioning shines and why you might want to implement it, even though it won’t solve your query performance problems.

Articles discussed are by Gail Shaw, Remus Rusanu, and the SQL Customer Advisory Team (SQLCAT). Scroll down below the video for direct links to each resource.

Check out the video.

Comments closed

Partition Switching

James Anderson gives his favorite feature in SQL Server:

The good news is that the SWITCH command works on regular tables and in any edition. This means I can quickly transfer all of the data from one table to another in Standard Edition.

In reality, I have found few uses for the regular table to regular table switch; the trick to add the IDENTITY property to a column with existing data is the most recent. SWITCH is most useful when partitioned tables are involved. Sorry, Standard Edition users.

Partition switching is a fascinating solution for a difficult technical problem.

Comments closed

Writable Partition Failure

Paul White shows us a scenario in which attempts to update a writable partition could fail:

The where clause is exactly the same as before. The only difference is that we are now (deliberately) setting the partitioning column equal to itself. This will not change the value stored in that column, but it does affect the outcome. The update now succeeds (albeit with a more complex execution plan):

The optimizer has introduced new Split, Sort, and Collapse operators, and added the machinery necessary to maintain each potentially-affected nonclustered index separately (using a wide, or per-index strategy).

Read on for the reason why this happens, as well as a few solutions.

Comments closed

Remember Partition Alignment

Kendra Little shows that truncating partitions in SQL Server 2016 requires all indexes be aligned to the partition:

If you have a non-aligned index on the table, you’ll see an error like this:

Msg 3756, Level 16, State 1, Line 1

TRUNCATE TABLE statement failed. Index ‘ix_FirstNameByBirthDate_pt_BirthYear_FirstNameId_nonaligned’ is not partitioned, but table ‘FirstNameByBirthDate_pt’ uses partition function ‘pf_fnbd’. Index and table must use an equivalent partition function.

This isn’t a bug, and it makes total sense from a logical point of view. “Non-aligned” indexes are not partitioned like the base table is– by definition they are either partitioned differently, or not partitioned at all. The chunk of data that you’re trying to truncate isn’t all in an easily identifiable partition that can be quickly marked as “data non grata”. There’s just no way to do a simple truncate when the data’s scattered all around.

Ideally, all indexes on a partitioned table would be partition-aligned, as it makes maintenance a lot easier.  That’s not always possible, though, so keep this in mind.

Comments closed

DATETIME2 Partition Elimination

Kendra Little shows that DATETIME2 implicit conversion can prevent partition elimination:

SQL Server is implicitly converting my date value to DATETIME2(7). That is a larger, more precise value than the data type I have in the table–FakeBirthDateStamp is DATETIME2(0).

That data type mismatch is preventing partition elimination!

This is a nasty issue to catch in production, especially after you spend a bunch of time arguing with devs that DATETIME2 is the way of the future, that it’s better because of the variable precision, etc.

Comments closed

Table Partitioning

Slava Murygin walks through partitioning:

Usually new Partition Ranges are added to the end of a Function/Schema, but in my example I’ll demonstrate how to split middle Partition.
At first we create new File Group. Then make it next within a schema and then split a Function:

All records, which are Less than 1/1/2015 and more than 1/1/2014, were moved to the new partition.

Partitioning is one of those topics I run into frequently enough to need to know it, but not frequently enough to memorize it; every time it feels like I’m starting from scratch.

Comments closed