Partitioned Views With Polybase

I look at using SQL 2000-style partitioning with a set of external tables:

Using a view, we were able to create a “partitioned” Polybase experience, similar to what we had in SQL Server 2000.  This form of poor man’s partitioning allows us to segment out data sets and query them independently, something which can be helpful when storing very large amounts of data off-site and only occasionally needing to query it.  The thing to remember, though, is that if you store this in Azure Blob Storage, you will need to pull down the entire table’s worth of data to do any processing.

This leads to a concept I first heard from Ginger Grant:  pseudo-StretchDB.  Instead of paying for what Stretch offers, you get an important subset of the functionality at a much, much lower price.  If you do store the data in Azure Blob Storage, you’re paying pennies per gigabyte per month.  For cold storage, like a scenario in which you need to keep data around to keep the auditors happy but your main application doesn’t use that information, it can work fine.  But if you need to query this data frequently, performance might be a killer.

For Polybase tables without the ability to perform external pushdown, coming up with a good partitioning strategy is probably one of the two best ways to improve performance, with creating a Polybase scale-out cluster the other method.

Related Posts

Switching Partitions And Table Structure

Andrew Pruski demonstrates a gotcha when switching partitions between tables: When working with partitioning the SWITCH operation has to be my favourite. The ability to move a large amount of data from one table to another as a META DATA ONLY operation is absolutely fantastic. What’s also cool is that we can switch data into a non-partitioned table. Makes […]

Read More

Gotchas When Indexing Partitioned Tables

Andrew Pruski gives us a couple of considerations when creating indexes on partitioned tables in SQL Server: Looking at that data page, we can see that SQL has added a UNIQUIFIER column. Now this is standard SQL behaviour, SQL does this to all non-unique clustered indexes whether they are on a partitioned table or not. But also […]

Read More


January 2017
« Dec Feb »