Press "Enter" to skip to content

Category: Partitioning

Enforcing Constraints across Postgres Partitions

Shaun Thomas explains a rule:

Postgres table partitioning is one of those features that feels like a superpower right up until it isn’t. Just define a partition key, carve up data into manageable chunks, and everything hums along beautifully. And what’s not to love? Partition pruning in query plans, smaller tables, faster maintenance, easy archiving of old data; it’s a smorgasbord of convenience.

Then you try to enforce a unique constraint without including the partition key, and Postgres behaves as if you just asked it to divide by zero. Well… about that.

Click through for an explanation, some workarounds that might work in specific circumstances, and a few closing remarks.

As for SQL Server, the same rule applies. If you want a unique index (which is what a unique key constraint uses under the covers), you must include the partitioning column. If you don’t include it, SQL Server will include it for you rather than giving a hard error.

Leave a Comment

Vertical Partitioning for Performance

Eran Golan splits out a table:

Not long ago, I worked with a customer who was experiencing persistent blocking and occasional deadlocks in one of their core systems. The application itself wasn’t new, but over the years it had grown significantly. New features had been added, more processes were interacting with the database, and naturally the schema had evolved along the way.

One table in particular stood out. It had gradually grown to contain well over a hundred columns. Originally it had been designed to represent a single business entity in one place, which made the model easy to understand and query. But as more attributes were added over time, the table became increasingly wide.

Frankly, based off of Eran’s description, this sounds like a failure in normalizing the table appropriately. Normalization is not just about “There are many of X to one Y, so make two separate tables for X and Y.” In particular, 5th normal form (keys imply join dependencies) tells us that, if we can break out a table X into X1 and X2, and then join X1 and X2 together without losing any information or generating spurious new information, then 5NF requires we break it out. Eran is describing in narrative exactly that concept, though the description of how the customer broke that data out may or may not have satisfied 5NF.

Leave a Comment

ALTER TABLE and Partitioned Tables in PostgreSQL

Chao Li classifies a series of commands:

Does an operation propagate to partitions? Does it affect future partitions? Does ONLY do what it claims? Why do some commands work on parents but not on partitions—or vice versa?

Today, PostgreSQL documentation describes individual ALTER TABLE sub-commands well, but it rarely explains their interaction with partitioned tables as a whole. As a result, users often discover the real behavior only through trial and error.

This post summarizes a systematic investigation of ALTER TABLE behavior on partitioned tables, turning scattered rules into a consistent classification model.

Click through for 15 buckets of commands relating to ALTER TABLE in PostgreSQL and see how they handle dealing with partitioned tables.

Comments closed

Function Volatility and PostgreSQL Partition Performance

Deepak Mahto covers how function volatility can affect how queries on partitioned data perform:

In one of our earlier blogs, we explored how improper volatility settings in PL/pgSQL functions — namely using IMMUTABLESTABLE, or VOLATILE — can lead to unexpected behavior and performance issues during migrations.

Today, let’s revisit that topic from a slightly different lens. This time, we’re not talking about your user-defined functions, but the ones PostgreSQL itself provides — and how their volatility can quietly shape your query performance, especially when you’re using partitioned tables.

Click through for one example using date-based partitioning and date functions.

Comments closed

Statistics on Partitioned Tables in PostgreSQL

Laurenz Albe gathers stats:

I recently helped a customer with a slow query. Eventually, an ANALYZE on a partitioned table was enough to fix the problem. This came as a surprise for the customer, since autovacuum was enabled. So I decided to write an article on how PostgreSQL collects partitioned table statistics and how they affect PostgreSQL’s estimates.

Read on to see how it works and how you can generate statistics at the table level and not just the partition level.

Comments closed

Data Archival and Retention in PostgreSQL

Daria Nikolaenko walks through a presentation:

I’ve started talking about something that happens with almost every Postgres database — the slow, steady growth of data. Whether it’s logs, events, or transactions — old rows pile up, performance suffers, and managing it all becomes tricky. My talk was focusing on  practical ways to archive, retain, and clean up data in PostgreSQL, without breaking queries or causing downtime.

Read on to learn more.

Comments closed

Partitioning in PostgreSQL

Umair Shahid takes us into partitioning strategies in PostgreSQL:

My recommended methodology for performance improvement of PostgreSQL starts with query optimization. The second step is architectural improvements, part of which is the partitioning of large tables.

Partitioning in PostgreSQL is one of those advanced features that can be a powerful performance booster. If your PostgreSQL tables are becoming very large and sluggish, partitioning might be the cure. 

It’s interesting to compare this against SQL Server, where partitioning is not a strategy for query performance improvements.

Comments closed

Swap-and-Drop for Partition Management

Rich Benner deals with a troublesome partition:

What are stubborn partitions in SQL Server and how do you delete them? This was an interesting issue I recently had to deal with on a client site that I thought our readers might find interesting.

The tables in use here are partitioned. The partition field is based upon a date field and we have a partition per month. There is a monthly maintenance job which creates our new partitions. The job should also delete the oldest partitions. This job has been failing to delete an old partition as the data file contained within is not empty. It’s stubborn!

If we try to remove this file we get the error “The File cannot be removed because it is not empty,” as you can see:

Read on for some diagnosis of the problem, as well as the solution Rich developed.

Comments closed

Hash versus Range Partitioning in PostgreSQL

Umair Shahid explains when hash and range partitioning work best:

I have always been a fan of RANGE partitioning using a date/time value in PostgreSQL. This isn’t always possible, however, and I recently came across a scenario where a table had grown large enough that it had to be partitioned, and the only reasonable key to use was a UUID styled identifier.

The goal of this post is to highlight when and why hashing your data across partitions in PostgreSQL might be a better approach.

Click through to learn more about each style of partitioning, as well as when hash partitioning may actually be the better fit.

Comments closed

Vertical Partitioning Rarely Works

Brent Ozar lays out an argument:

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

  • The table’s size on disk is huge
  • Queries are slow, especially when they do table scans
  • People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

Read on to understand why this is rarely a good idea and what you can do instead.

I will say that I’ve had success with vertical partitioning in very specific circumstances:

  1. There are large columns, like blobs of JSON, binary data, or very large text strings.
  2. There exists a subset of columns the application (or caller) rarely needs.
  3. Those large columns are in the subset of columns the caller rarely needs, or can access via point lookup.

For a concrete example, my team at a prior company worked on a product that performed demand forecasting on approximately 10 million products across the customer base. For each product, we had the choice between using a common model (if the sales fit a common pattern) or generating a unique model. Because we were using SQL Server Machine Learning Services, we needed to store those custom models in the database. But each model, even when compressed, could run in the kilobytes to megabytes in size. We only needed to retrieve the model during training or inference, not reporting, but we did have reports that tracked what kind of model we were using, whether it was a standard model or custom (and if custom, what algorithm we used). Thus, we had the model binary in its own table, separate from the remaining model data.

Comments closed