Partitioning – Curated SQL

Data Archival and Retention in PostgreSQL

Published 2025-05-21 by Kevin Feasel

Daria Nikolaenko walks through a presentation:

I’ve started talking about something that happens with almost every Postgres database — the slow, steady growth of data. Whether it’s logs, events, or transactions — old rows pile up, performance suffers, and managing it all becomes tricky. My talk was focusing on practical ways to archive, retain, and clean up data in PostgreSQL, without breaking queries or causing downtime.

Read on to learn more.

Comments closed

Partitioning in PostgreSQL

Published 2025-05-12 by Kevin Feasel

Umair Shahid takes us into partitioning strategies in PostgreSQL:

My recommended methodology for performance improvement of PostgreSQL starts with query optimization. The second step is architectural improvements, part of which is the partitioning of large tables.

Partitioning in PostgreSQL is one of those advanced features that can be a powerful performance booster. If your PostgreSQL tables are becoming very large and sluggish, partitioning might be the cure.

It’s interesting to compare this against SQL Server, where partitioning is not a strategy for query performance improvements.

Comments closed

Swap-and-Drop for Partition Management

Published 2025-04-15 by Kevin Feasel

Rich Benner deals with a troublesome partition:

What are stubborn partitions in SQL Server and how do you delete them? This was an interesting issue I recently had to deal with on a client site that I thought our readers might find interesting.

The tables in use here are partitioned. The partition field is based upon a date field and we have a partition per month. There is a monthly maintenance job which creates our new partitions. The job should also delete the oldest partitions. This job has been failing to delete an old partition as the data file contained within is not empty. It’s stubborn!

If we try to remove this file we get the error “The File cannot be removed because it is not empty,” as you can see:

Read on for some diagnosis of the problem, as well as the solution Rich developed.

Comments closed

Hash versus Range Partitioning in PostgreSQL

Published 2025-03-12 by Kevin Feasel

Umair Shahid explains when hash and range partitioning work best:

I have always been a fan of RANGE partitioning using a date/time value in PostgreSQL. This isn’t always possible, however, and I recently came across a scenario where a table had grown large enough that it had to be partitioned, and the only reasonable key to use was a UUID styled identifier.

The goal of this post is to highlight when and why hashing your data across partitions in PostgreSQL might be a better approach.

Click through to learn more about each style of partitioning, as well as when hash partitioning may actually be the better fit.

Comments closed

Vertical Partitioning Rarely Works

Published 2025-02-26 by Kevin Feasel

Brent Ozar lays out an argument:

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

The table’s size on disk is huge

Queries are slow, especially when they do table scans

People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

Read on to understand why this is rarely a good idea and what you can do instead.

I will say that I’ve had success with vertical partitioning in very specific circumstances:

There are large columns, like blobs of JSON, binary data, or very large text strings.
There exists a subset of columns the application (or caller) rarely needs.
Those large columns are in the subset of columns the caller rarely needs, or can access via point lookup.

For a concrete example, my team at a prior company worked on a product that performed demand forecasting on approximately 10 million products across the customer base. For each product, we had the choice between using a common model (if the sales fit a common pattern) or generating a unique model. Because we were using SQL Server Machine Learning Services, we needed to store those custom models in the database. But each model, even when compressed, could run in the kilobytes to megabytes in size. We only needed to retrieve the model during training or inference, not reporting, but we did have reports that tracked what kind of model we were using, whether it was a standard model or custom (and if custom, what algorithm we used). Thus, we had the model binary in its own table, separate from the remaining model data.

Comments closed

Partitioned Tables and Indexes in PostgreSQL

Published 2025-01-27 by Kevin Feasel

Hettie Dombrovskaya runs into an error:

Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.

So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.

Read on to learn why.

Comments closed

Partitioned Tables and Faster Performance

Published 2024-12-23 by Kevin Feasel

Henrietta Dombrovskaya provides a warning:

I am unsure whether others have this problem, but I find myself explaining the benefits9and what’s not)of partitioning tables in Postgres over and over again.

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).

Henrietta’s example is specifically around Postgres, but it applies to SQL Server all the same: partitioning is primarily a maintenance benefit rather than a performance benefit.

Comments closed

Fractional Path Performance Issues in Postgres Partitioned Tables

Published 2024-12-16 by Kevin Feasel

Andrei Lepikhov digs into an interesting finding:

While the user notices the positive aspects of technology, a developer, usually encountering limitations, shortcomings or bugs, watches the product from a completely different perspective. The same stuff happened at this time: after the publication of the comparative testing results, where Join-Order-Benchmark queries were passed on a database with and without partitions, I couldn’t push away the feeling that I had missed something. In my mind, Postgres should build a worse plan with partitions than without them. And this should not be just a bug but a technological limitation. After a second thought, I found a weak spot – queries with limits.

Read on to see what Andrei came up with.

Comments closed

Partitioning in Oracle versus PostgreSQL

Published 2024-11-07 by Kevin Feasel

Umair Shahid continues a series on migrating from Oracle to PostgreSQL:

Table partitioning is a database design technique that divides a large table into smaller, more manageable sub-tables called partitions. Each partition holds a subset of the data based on specific criteria, such as date ranges, categories, or hash values. While partitioning makes it seem like you’re working with a single large table, behind the scenes, queries and operations are distributed across multiple partitions.

Read on to understand the differences between the two platforms. In this case, there are some pretty significant differences.

Comments closed

Table Partitioning in PostgreSQL

Published 2024-08-19 by Kevin Feasel

Lee Asher talks partitions:

Partitioning in PostgreSQL was for many years a feature almost entirely ignored, and even today is often thought of only as a means to increase performance on large tables. While it’s certainly useful for this, it’s also an invaluable tool for solving common data maintenance problems. In this article, we’ll examine the feature from this perspective.

Let’s consider a common situation: a table continually ingesting time series data ordered chronologically: purchase orders, forum posts, time-series data, whatever. Most systems contain more than one table like this. And because we live in a world of Big Data, these tables can contain hundreds of millions, even billions of rows. Business rules require the data to remain active for some fixed period of time, then be deleted, (or archived first, then deleted). Commonly this is a monthly process: every month copy data that has “aged out” to another location, then delete the rows from the live table.

It’s interesting that people consider partitioning a performance-improving technique in the PostgreSQL world, but in the SQL Server world, partitioning is primarily for ease of maintenance and we warn against thinking about it as a performance-improving technique.

Comments closed

Category: Partitioning