Press "Enter" to skip to content

Category: Partitioning

Hash versus Range Partitioning in PostgreSQL

Umair Shahid explains when hash and range partitioning work best:

I have always been a fan of RANGE partitioning using a date/time value in PostgreSQL. This isn’t always possible, however, and I recently came across a scenario where a table had grown large enough that it had to be partitioned, and the only reasonable key to use was a UUID styled identifier.

The goal of this post is to highlight when and why hashing your data across partitions in PostgreSQL might be a better approach.

Click through to learn more about each style of partitioning, as well as when hash partitioning may actually be the better fit.

Leave a Comment

Vertical Partitioning Rarely Works

Brent Ozar lays out an argument:

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

  • The table’s size on disk is huge
  • Queries are slow, especially when they do table scans
  • People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

Read on to understand why this is rarely a good idea and what you can do instead.

I will say that I’ve had success with vertical partitioning in very specific circumstances:

  1. There are large columns, like blobs of JSON, binary data, or very large text strings.
  2. There exists a subset of columns the application (or caller) rarely needs.
  3. Those large columns are in the subset of columns the caller rarely needs, or can access via point lookup.

For a concrete example, my team at a prior company worked on a product that performed demand forecasting on approximately 10 million products across the customer base. For each product, we had the choice between using a common model (if the sales fit a common pattern) or generating a unique model. Because we were using SQL Server Machine Learning Services, we needed to store those custom models in the database. But each model, even when compressed, could run in the kilobytes to megabytes in size. We only needed to retrieve the model during training or inference, not reporting, but we did have reports that tracked what kind of model we were using, whether it was a standard model or custom (and if custom, what algorithm we used). Thus, we had the model binary in its own table, separate from the remaining model data.

Comments closed

Partitioned Tables and Indexes in PostgreSQL

Hettie Dombrovskaya runs into an error:

Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.

So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.

Read on to learn why.

Comments closed

Partitioned Tables and Faster Performance

Henrietta Dombrovskaya provides a warning:

I am unsure whether others have this problem, but I find myself explaining the benefits9and what’s not)of partitioning tables in Postgres over and over again.

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).

Henrietta’s example is specifically around Postgres, but it applies to SQL Server all the same: partitioning is primarily a maintenance benefit rather than a performance benefit.

Comments closed

Fractional Path Performance Issues in Postgres Partitioned Tables

Andrei Lepikhov digs into an interesting finding:

While the user notices the positive aspects of technology, a developer, usually encountering limitations, shortcomings or bugs, watches the product from a completely different perspective. The same stuff happened at this time: after the publication of the comparative testing results, where Join-Order-Benchmark queries were passed on a database with and without partitions, I couldn’t push away the feeling that I had missed something. In my mind, Postgres should build a worse plan with partitions than without them. And this should not be just a bug but a technological limitation. After a second thought, I found a weak spot – queries with limits.

Read on to see what Andrei came up with.

Comments closed

Partitioning in Oracle versus PostgreSQL

Umair Shahid continues a series on migrating from Oracle to PostgreSQL:

Table partitioning is a database design technique that divides a large table into smaller, more manageable sub-tables called partitions. Each partition holds a subset of the data based on specific criteria, such as date ranges, categories, or hash values. While partitioning makes it seem like you’re working with a single large table, behind the scenes, queries and operations are distributed across multiple partitions.

Read on to understand the differences between the two platforms. In this case, there are some pretty significant differences.

Comments closed

Table Partitioning in PostgreSQL

Lee Asher talks partitions:

Partitioning in PostgreSQL was for many years a feature almost entirely ignored, and even today is often thought of only as a means to increase performance on large tables. While it’s certainly useful for this, it’s also an invaluable tool for solving common data maintenance problems. In this article, we’ll examine the feature from this perspective.

Let’s consider a common situation: a table continually ingesting time series data ordered chronologically: purchase orders, forum posts, time-series data, whatever. Most systems contain more than one table like this. And because we live in a world of Big Data, these tables can contain hundreds of millions, even billions of rows. Business rules require the data to remain active for some fixed period of time, then be deleted, (or archived first, then deleted). Commonly this is a monthly process: every month copy data that has “aged out” to another location, then delete the rows from the live table.

It’s interesting that people consider partitioning a performance-improving technique in the PostgreSQL world, but in the SQL Server world, partitioning is primarily for ease of maintenance and we warn against thinking about it as a performance-improving technique.

Comments closed

Three Partitioning Options in Postgres

Semab Tariq shows how to perform three types of partitioning in PostgreSQL:

PostgreSQL is renowned for its exceptional performance in managing data. One of its standout features is partitioning, a technique that divides large datasets into smaller, more manageable segments. Partitioning provides several benefits, including improved query performance, streamlined data management, and enhanced scalability. By organizing data into partitions, PostgreSQL can execute searches more efficiently and handle tasks with greater ease. 

In this blog, we will delve into the details of partitioning in PostgreSQL, exploring its various types, advantages, and drawbacks. We’ll uncover how partitioning can revolutionize data management and decision-making processes in database environments.

Click through for demonstrations of range, list, and hash partitioning.

Comments closed

The Joy of Partitioned Views

Rod Edwards talks partitioned views:

This post came around when I was at a loose end one evening, and just started poking at a local sandpit database, and it got me reminiscing and revisiting / testing a few things. The devil makes work for idle thumbs and all that…

Partitioned Views…do they have a place in society anymore?

Rod does a great job of following Betteridge’s Law of Headlines, as well as saving the ‘Yes’ answer for the post itself. Partitioned views come with their own pains, though one use case Rod did not include is using PolyBase and partitioned views to move “cold” data to slower external storage.

Comments closed

Arbitrary Intervals for Partitioning in Postgres

Keith Fiske does a bit of interval math:

Whether you are managing a large table or setting up automatic archiving, time based partitioning in Postgres is incredibly powerful. pg_partman’s newest versions support a huge variety of custom time internals. Marco just published a post on using pg_partman with our new database product for doing analytics with PostgresCrunchy Bridge for Analytics. So I thought this would be a great time to review the basic and complex options for the time based partitioning.

Read on for a note of how pg_partman works and interval management, especially for versions earlier than 5.0.

Comments closed