Partitioning – Page 3

Partitioning Data in Power BI

Published 2022-11-28 by Kevin Feasel

Paul Turley continues a series on working with large amounts of data in Power BI:

You don’t have to have massive tables to benefit from partitioning. Even tables with a few hundred thousand records can benefit from partitioning, to improve data refresh performance and to detect source data changes. There is little maintenance overhead, so the benefits usually outweigh the cost, in terms of effort and management.

Click through for Paul’s thoughts on the topic.

Comments closed

Replicated Tables in Dedicated SQL Pools

Published 2022-10-21 by Kevin Feasel

Pedro Martinez explains the idea behind replicated tables in Azure Synapse Analytics dedicated SQL pools:

If you have ever used Azure Synapse Analytics dedicated SQL pool you would know there are multiple table types to choose from, for your workload. You might ask yourself, “when can I use Replicated table type and how I can efficiently use them”?
This blog is going to talk in detail about replicated table type, when to use and what are best practices for its usage. But before that, let’s start by understanding the different table types:

I’ve seen replicated tables get overused, so check out Pedro’s advice on how not to get burned with them.

Comments closed

Auto Partitioning Recommendations for Oracle

Published 2022-10-18 by Kevin Feasel

Brendan Tierney checks out some recommendations:

In a previous blog post I gave an overview of the DBMS_AUTO_PARTITION package in Oracle Autonomous Database. This looked at how you can get started and to setup Auto Partitioning and to allow it to automatically implement partitioning.
This might not be something the DBAs will want to happen for lots of different reasons. An alternative is to use DBMS_AUTO_PARTITION to make recommendations for tables where partitioning will have a performance improvement. The DBA can inspect these recommendations and decide which of these to implement.

Read on to see how you can run the recommender, as well as what a recommendation looks like.

Comments closed

The Cost of Empty Partitions in SQL Server

Published 2022-06-22 by Kevin Feasel

Aaron Bertrand reminds us that TANSTAAFL:

Not too long ago, I came across a table that had 15,000 partitions—all but 4 of them empty. I bet when you have implemented partitioning you, too, have wondered: “Why shouldn’t I create all future partitions now?”
The question is valid: wouldn’t maintenance be easier if you only had to phase out old partitions, without ever worrying about adding partitions to accommodate new data?
Here’s the thing. Microsoft will never tell you this, but empty partitions are not free. I don’t mean you will get an invoice for creating too many, but you will pay for them in other ways.

I think the reason people do this sort of thing is that partition management is harder than it really needs to be in SQL Server. Adopting the partitioning process for dedicated SQL pools in Azure Synapse Analytics would be a good start but consider that partitions are almost always date or numeric intervals (often pseudo-date numbers like 20220601 to represent June 1, 2022) and allow people to create partitions based on the key and have the system manage it. I’m being vague and hand-wavey and not talking about all of the edge cases (like if some dope puts in a date for 22220601 instead of 20220601) but I think some PARTITION RANGE RIGHT ON [DateWK] INTERVAL MONTHLY WITH MAX_FUTURE_PARTITIONS=2, MAX_MAINTAINED_PARTITIONS=48, PARTITION_ARCHIVAL_TABLE=History.MyTable or something like that would cut to the core of what partition management does without writing thousand-line scripts full of dynamic SQL trying to manage these things.

Comments closed

Automated Partitioned Table Management

Published 2022-05-04 by Kevin Feasel

Eitan Blumin automates creation and deletion of partitions in SQL Server:

Before we begin, there are a few “ground rules” we should understand first:
1 – Partition Functions define the partition ranges
This means that whenever we want to eliminate an old partition range or add a new partition range, the PARTITION FUNCTION is the object that we actually need to modify.

Click through for Eitan’s entire process and a couple of scripts. This is an area that SQL Server could have made a lot easier, especially for periodic processes, by including options like “Daily” or “Monthly” or “Weekly(start on Monday)” for intervals rather than making people specify every partition separately.

Comments closed

Reasons for Partitioning in SQL Server

Published 2021-12-09 by Kevin Feasel

Erik Darling has opinions:

When I work with clients, nearly every single one has this burning question about partitioning.
“We’ve got this huge table, should we partition it?”
“Do you need to insert or delete data in big chunks?”
“No, it’s all transactional.”
“Do you have last page contention problems?”
“No, but won’t it help performance?”
“No, not unless you’re using clustered column store.”
“…”

Read on to unpack Erik’s argument. I do wish that there were more good cases for partitioning in SQL Server, but they’re almost all in the analytics space—which is part of why partitioning is a lot more useful in Azure Synapse Analytics dedicated SQL pools.

Comments closed

Partition Switching of Staging Data

Published 2021-09-30 by Kevin Feasel

Aaron Bertrand shares a technique to make table refreshes easier for end users:

So, what is a staging table in SQL? A staging table can be more easily understood using a real-world example: Let’s say you have a table full of vegetables you’re selling at the local farmer’s market. As your vegetables sell and you bring in new inventory:
– When you bring a load of new vegetables, it’s going to take you 20 minutes to clear off the table and replace the remaining stock with the newer product.
– You don’t want customers to sit there and wait 20 minutes for the switch to happen, since most will get their vegetables elsewhere.
Now, what if you had a second empty table where you load the new vegetables, and while you’re doing that, customers can still buy the older vegetables from the first table? (Let’s pretend it’s not because the older vegetables went bad or are otherwise less desirable.)

Read on for some techniques Aaron used for a long time and why he switched to partition switching.

Comments closed

Bug around Parallel Eager Spools and Batch Mode

Published 2021-03-24 by Kevin Feasel

Paul White digs into a nasty bug:

A more accurate description of the issue would be:
This bug can cause wrong results, incorrect error messages, and statement failure when:
– A data-modification statement requires Halloween Protection.
– That protection is provided by a Parallel Eager Spool.
– The spool is on the probe side of a Batch Mode Hash Join.
This issue affects Azure SQL Database and SQL Server 2014 to 2019 inclusive.

Read on for a repro and Paul’s thoughts. As of March 2021, this is an active problem, so it’s worth keeping an eye on in your environment. I’d wager, though, that this probably doesn’t pop up on its own very frequently.

Comments closed

The Benefits of Table Partitioning

Published 2021-01-20 by Kevin Feasel

Brenda Bentz lays out some of the benefits of table partitioning in SQL Server:

Table partitioning in MS SQL Server is an advanced option that breaks a table into logically smaller chunk (partition) and then stores each chuck to a multiple filegroups. Each chuck can be accessed and maintained separately. Partitioning is not visible to users; it behaves like one logical table to the end user. Partitioning was introduced in SQL 2005 as an Enterprise edition feature but starting with SQL 2016 SP1 it is available on all the editions.

Table partitioning can work to improve performance in specific circumstances, though you’ll definitely want to do a lot of testing before rolling it out. H/T Amanda White.

Comments closed

Partitioning Tricks

Published 2020-06-24 by Kevin Feasel

Raul Gonzalez shows us five things you can do with partitioning in SQL Server:

Once we have rebuilt that old data to minimise its footprint and moved it to a cheaper storage tier, if we know no one will have to modify it, it’d be a good idea to make it READ_ONLY.
By making the data READ_ONLY, we can not only prevent accidental deletion or modification, but also reduce the workload required to maintain it, because as we’ve seen before, we can action index maintenance only on the READ_WRITE parts (partitions) of the data where fragmentation might still happen.

Read on for the rest of the tips and note that none of these are directly of the “Make your queries faster” variety, though a couple can have positive performance implications.

Comments closed

Category: Partitioning