Press "Enter" to skip to content

Category: Partitioning

An Overview of Partitioning and Sharding in Postgres

Michael Christofides defines terms:

It has been possible to do partitioning in PostgreSQL for quite a while — splitting what is logically one large table into smaller physical tables. Since version 10, a huge leap was made with the introduction of declarative partitioning, and more improvements have come every year since.

Sharding is a different story — splitting what is logically one large database into smaller physical databases. The primary tool for this in the PostgreSQL ecosystem is the Citus extension. But you can also handle the sharding logic at the application level, as recent posts from the likes of Notion and Figma have described. Somewhat confusingly, some forms of sharding are sometimes referred to as vertical partitioning, including by the team at Figma.

Read on for a few thoughts on when to perform each action and what the costs and benefits are.

Comments closed

Automatic Partition Maintenance in Power BI

Shabnam Watson answers an attendee question:

During one of my presentations on Incremental Refresh (IR) in Power BI, someone asked what happens during a Power BI automatic partition maintenance window when Power BI has an opportunity to merge smaller partitions into larger ones. Does Power BI use the data that is already imported into Power BI for the smaller partitions and combine it into a bigger one or does it re-read the data for those smaller partitions again. For example, if a dataset has an IR policy to refresh the last 1 day, and it has read data for all the days in a previous month, one day for each, when the new month arrives, it has an opportunity to merge the smaller day partitions into a month partition for the previous month. Does it re-read the previous month’s data from the source again or does it use what it already has in its memory?

Read on for the answer.

Comments closed

Partitioning Data in Power BI

Paul Turley continues a series on working with large amounts of data in Power BI:

You don’t have to have massive tables to benefit from partitioning. Even tables with a few hundred thousand records can benefit from partitioning, to improve data refresh performance and to detect source data changes. There is little maintenance overhead, so the benefits usually outweigh the cost, in terms of effort and management.

Click through for Paul’s thoughts on the topic.

Comments closed

Replicated Tables in Dedicated SQL Pools

Pedro Martinez explains the idea behind replicated tables in Azure Synapse Analytics dedicated SQL pools:

If you have ever used Azure Synapse Analytics dedicated SQL pool you would know there are multiple table types to choose from, for your workload. You might ask yourself, “when can I use Replicated table type and how I can efficiently use them”?  

This blog is going to talk in detail about replicated table type, when to use and what are best practices for its usage. But before that, let’s start by understanding the different table types: 

I’ve seen replicated tables get overused, so check out Pedro’s advice on how not to get burned with them.

Comments closed

Auto Partitioning Recommendations for Oracle

Brendan Tierney checks out some recommendations:

In a previous blog post I gave an overview of the DBMS_AUTO_PARTITION package in Oracle Autonomous Database. This looked at how you can get started and to setup Auto Partitioning and to allow it to automatically implement partitioning.

This might not be something the DBAs will want to happen for lots of different reasons. An alternative is to use DBMS_AUTO_PARTITION to make recommendations for tables where partitioning will have a performance improvement. The DBA can inspect these recommendations and decide which of these to implement.

Read on to see how you can run the recommender, as well as what a recommendation looks like.

Comments closed

The Cost of Empty Partitions in SQL Server

Aaron Bertrand reminds us that TANSTAAFL:

Not too long ago, I came across a table that had 15,000 partitions—all but 4 of them empty. I bet when you have implemented partitioning you, too, have wondered: “Why shouldn’t I create all future partitions now?”

The question is valid: wouldn’t maintenance be easier if you only had to phase out old partitions, without ever worrying about adding partitions to accommodate new data?

Here’s the thing. Microsoft will never tell you this, but empty partitions are not free. I don’t mean you will get an invoice for creating too many, but you will pay for them in other ways.

I think the reason people do this sort of thing is that partition management is harder than it really needs to be in SQL Server. Adopting the partitioning process for dedicated SQL pools in Azure Synapse Analytics would be a good start but consider that partitions are almost always date or numeric intervals (often pseudo-date numbers like 20220601 to represent June 1, 2022) and allow people to create partitions based on the key and have the system manage it. I’m being vague and hand-wavey and not talking about all of the edge cases (like if some dope puts in a date for 22220601 instead of 20220601) but I think some PARTITION RANGE RIGHT ON [DateWK] INTERVAL MONTHLY WITH MAX_FUTURE_PARTITIONS=2, MAX_MAINTAINED_PARTITIONS=48, PARTITION_ARCHIVAL_TABLE=History.MyTable or something like that would cut to the core of what partition management does without writing thousand-line scripts full of dynamic SQL trying to manage these things.

Comments closed

Automated Partitioned Table Management

Eitan Blumin automates creation and deletion of partitions in SQL Server:

Before we begin, there are a few “ground rules” we should understand first:

1 – Partition Functions define the partition ranges

This means that whenever we want to eliminate an old partition range or add a new partition range, the PARTITION FUNCTION is the object that we actually need to modify.

Click through for Eitan’s entire process and a couple of scripts. This is an area that SQL Server could have made a lot easier, especially for periodic processes, by including options like “Daily” or “Monthly” or “Weekly(start on Monday)” for intervals rather than making people specify every partition separately.

Comments closed

Reasons for Partitioning in SQL Server

Erik Darling has opinions:

When I work with clients, nearly every single one has this burning question about partitioning.

“We’ve got this huge table, should we partition it?”

“Do you need to insert or delete data in big chunks?”

“No, it’s all transactional.”

“Do you have last page contention problems?”

“No, but won’t it help performance?”

“No, not unless you’re using clustered column store.”

“…”

Read on to unpack Erik’s argument. I do wish that there were more good cases for partitioning in SQL Server, but they’re almost all in the analytics space—which is part of why partitioning is a lot more useful in Azure Synapse Analytics dedicated SQL pools.

Comments closed

Partition Switching of Staging Data

Aaron Bertrand shares a technique to make table refreshes easier for end users:

So, what is a staging table in SQL? A staging table can be more easily understood using a real-world example: Let’s say you have a table full of vegetables you’re selling at the local farmer’s market. As your vegetables sell and you bring in new inventory:

– When you bring a load of new vegetables, it’s going to take you 20 minutes to clear off the table and replace the remaining stock with the newer product.

– You don’t want customers to sit there and wait 20 minutes for the switch to happen, since most will get their vegetables elsewhere.

Now, what if you had a second empty table where you load the new vegetables, and while you’re doing that, customers can still buy the older vegetables from the first table? (Let’s pretend it’s not because the older vegetables went bad or are otherwise less desirable.)

Read on for some techniques Aaron used for a long time and why he switched to partition switching.

Comments closed

Bug around Parallel Eager Spools and Batch Mode

Paul White digs into a nasty bug:

more accurate description of the issue would be:

This bug can cause wrong results, incorrect error messages, and statement failure when:

– A data-modification statement requires Halloween Protection.
– That protection is provided by a Parallel Eager Spool.
– The spool is on the probe side of a Batch Mode Hash Join.

This issue affects Azure SQL Database and SQL Server 2014 to 2019 inclusive.

Read on for a repro and Paul’s thoughts. As of March 2021, this is an active problem, so it’s worth keeping an eye on in your environment. I’d wager, though, that this probably doesn’t pop up on its own very frequently.

Comments closed