Page-Level Caching in PostgreSQL with PAX

Laetitia Avrot has a two-parter on PAX. The first part is from a couple of months ago and sets the stage:

Picture this: you walk into a library. Each book is a database record. PostgreSQL’s traditional storage (NSM – N-ary Storage Model) stores complete books on each shelf: chapter 1, chapter 2, chapter 3, all bound together.

Here’s the problem that keeps me up at night: when you need only chapter 3 from 1,000 books, you must pull each complete book off the shelf, flip through chapters 1 and 2 that you’ll never read, grab chapter 3, and move to the next book.

You’re wasting time. You’re wasting energy. You’re wasting cache bandwidth.

But it’s not all roses:

PAX looks elegant on paper: minipages, cache locality, column-oriented access inside an 8KB page. But the moment you ask how this actually would work with Postgres, the complexity arrives fast. NULL values, variable-length types, MVCC, boundary shifts. Let’s go through it all.

To be clear, this is not a product (today). It’s thinking about how to apply the results of an academic paper to Postgres and all of the tricky challenges that can come from what seems to be an easy idea.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30