Press "Enter" to skip to content

Write Storms and PostgreSQL

Shaun Thomas talks checkpoints:

Every database has to reconcile two uncomfortable truths: memory is fast but volatile, and disk is slow but durable. Postgres handles this tension through its Write-Ahead Log (WAL), which records every change before it happens. But the WAL can’t grow forever. At some point, Postgres needs to flush all those accumulated dirty pages to disk and declare a clean starting point. That process is called a checkpoint, and when it goes wrong, it can bring throughput to its knees.

One thing I would note is that direct-attached nVME storage is approximately 1 order of magnitude slower than RAM. Yeah, that’s still a lot slower, but the gap has closed significantly. If you have PCIe 5 nVME drives (call that 12-14 GB/sec) and relatively slow RAM (20 GB/sec), it’s getting close to on par. But once you move past the top-of-the-line for disk speed, you add more orders of magnitude and everything Shaun describes becomes a problem again.

Jeremy Schneider offers a follow-up involving autovacuum_cost_delay:

A few days ago, Shaun Thomas published an article over on the pgEdge blog called [Checkpoints, Write Storms, and You]. Sadly a lot of corporate blogs don’t have comment functionality anymore. I left a few comments [on LinkedIn], but overall let me say this article is a great read, and I’m always happy to see someone dive into an important and overlooked topic, present a good technical description, and include real test results to illustrate the details.

I don’t have any reproducible real test results today. But I have a good story and a little real data.

Check out both of those articles.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.