Internals – Curated SQL

A Primer on Data Storage in PostgreSQL

Published 2026-02-24 by Kevin Feasel

The whole idea behind a database is the ability to persist the data. You want your inventory of widgets to get stored so you can look at it later. That means writing out to disks. However, what is writing to disk and where is it being written? Unlike SQL Server which has one (or more) big file for all data, PostgreSQL has a collection of a large number of files. There is a methodology and structure to these files that you need to understand in order to later understand how the data gets written to and retrieved from these files.

While we’re going to be very focused on file, page, folder, etc., throughout this article, that’s just part of the physical nature of persisting your data. What is being persisted is still the logical information you’re most interested in – rows and columns. I just wanted to emphasize the distinction between the two here.

Click through to see how PostgreSQL stores information.

Leave a Comment

The Makeup of an 8KB Page in PostgreSQL

Published 2026-02-20 by Kevin Feasel

Radim Marek takes us through the structure of a page:

If you read previous post about buffers, you already know PostgreSQL might not necessarily care about your rows. You might be inserting a user profile, or retrieving payment details, but all that Postgres works with are blocks of data. 8KB blocks, to be precise. You want to retrieve one tiny row? PostgreSQL hauls an entire 8,192-byte page off the disk just to give it to you. You update a single boolean flag? Same thing. The 8KB page is THE atomic unit of I/O.

But knowing those pages exist isn’t enough. To understand why the database behaves the way it does, you need to understand how it works. Every time you execute INSERT, PostgreSQL needs to figure out how to fit it into one of those 8,192-byte pages.

It is a little wild how three of the largest relational database systems use 8KB pages. I know that, on the SQL Server side, they’ve experimented with different page sizes internally and have repeatedly said that, even recently (the last time I heard this was maybe about 3 years ago at a SQL Saturday), there just isn’t a benefit from moving away from 8KB. But what’s in those 8KB differ, and Radim goes into details on what’s in PostgreSQL.

Leave a Comment

SQL Server Transaction Log Architecture

Published 2026-02-20 by Kevin Feasel

Paul Randal re-continues a series:

In the first part of this series I introduced basic terminology around logging so I recommend you read that before continuing with this post. Everything else I’ll cover in the series requires knowing some of the architecture of the transaction log, so that’s what I’m going to discuss this time. Even if you’re not going to follow the series, some of the concepts I’m going to explain below are worth knowing for everyday tasks that DBAs handle in production.

Note: as I progress through this series and talk about aspects of the log, there are often little edge-cases or weird behaviors in niche circumstances that have been added or changed over the years. I’ll ignore those unless I specifically want to call them out, otherwise the posts would be riddled with rat-holes and mazes of twisty-turny little passages (yes, I loved ’80s text-based adventure games 🙂 that would distract from the main things to learn about in each post.

Click through for a primer on virtual log files, log blocks, and log sequence numbers.

Leave a Comment

Read Efficiency in PostgreSQL Queries

Published 2026-02-18 by Kevin Feasel

Michael Christofides explains what’s happening under the covers:

A lot of the time in database land, our queries are I/O constrained. As such, performance work often involves reducing the number of page reads. Indexes are a prime example, but they don’t solve every issue (a couple of which we’ll now explore).

The way Postgres handles consistency while serving concurrent queries is by maintaining multiple row versions in both the main part of a table (the “heap”) as well as in the indexes (docs). Old row versions take up space, at least until they are no longer needed, and the space can be reused. This extra space is commonly referred to as “bloat”. Below we’ll look into both heap bloat and index bloat, how they can affect query performance, and what you can do to both prevent and respond to issues.

Read on for a detailed explanation.

Leave a Comment

Memory-Optimized Storage Structures in SQL Server

Published 2026-02-17 by Kevin Feasel

Hugo Kornelis digs into another storage structure:

After discussing traditional on-disk rowstore storage in part 1 and columnstores in part 2, it is now time to turn our eye towards memory-optimized storage structures in SQL Server.

Memory-optimized storage was introduced in SQL Server 2014, as part of a project that was codenamed “Hekaton” and later renamed to in-memory OLTP. Whereas columnstore indexes were specifically targeted towards large scale analytical work, Hekaton and memory-optimized tables are specifically geared towards high volume OLTP workloads. By fully eliminating locks and latches, and using precompiled machine code where possible, the processing time of transactions is significantly reduced, allowing for throughput numbers that were previously impossible to achieve.

Read on to learn much more about how SQL Server manages memory-optimized data and the types of operations that are permissible on this internal storage.

Leave a Comment

Transaction ID Locking

Published 2026-02-16 by Kevin Feasel

Hugo Kornelis disentangles two new features in SQL Server 2025:

One of these two features is Transaction ID (TID) Locking. Slated to end the memory waste of thousands of individual row locks, and the concurrency killer of lock escalation. What it is, how does it work, what are the limitations, and do we really get a free lunch?

Click through for the video, though I am firmly wedded to the idea that TANSTAAFL. I say this without spoiling any part of the video.

Leave a Comment

Identifying a Query in Oracle vs PostgreSQL

Published 2026-02-12 by Kevin Feasel

Kellyn Gorman brings in the usual suspects:

“How does the database identify this query and its execution plan?”

Both Oracle and PostgreSQL answer this question, but I find they do it in very different ways, reflecting fundamentally different design philosophies around optimization, observability, and stability. As I dive into this rabbit hole once again, I’m going to reflect on how Oracle’s SQL_ID differs from the query_id in PostgreSQL and how two terms that sound so similar (PLAN_HASH_VALUE and query_hash) could be generated so differently, as well as misinterpreted. I’m guilty of it myself, so it’s a good place to spend some time.

Read on for the answer.

Leave a Comment

Buffers in PostgreSQL

Published 2026-01-27 by Kevin Feasel

Radim Marek goes deep into buffers:

The work around RegreSQL led me to focus a lot on buffers. If you are a casual PostgreSQL user, you have probably heard about adjusting shared_buffers and followed the good old advice to set it to 1/4 of available RAM. But after we went a little bit too enthusiastic about them on a recent Postgres FM episode I’ve been asked what that’s all about.

Buffers are one of those topics that easily gets forgotten. And while they are a foundation block of PostgreSQL’s performance architecture, most of us treat them as a black box. This article is going to attempt to change that.

Read on to learn more about how PostgreSQL users buffers.

Comments closed

How Rowgoals Work in SQL Server

Published 2026-01-19 by Kevin Feasel

Hugo Kornelis has a new video:

For my second vlog, I decided to talk about rowgoals. First an explanation of what they are, then an overview of some obvious and some not so obvious cases where the optimizer will use a rowgoal, and finally a warning about cases where this normally beneficial feature might hurt instead of help.

Click through for part one of a new video series.

Comments closed

Investigating Hash Match Spills to tempdb

Published 2025-12-30 by Kevin Feasel

Hugo Kornelis digs in when you’re overdrawn at the memory bank:

Finding data in tempdb is hard. Not when the data is in objects we created ourselves, such as temporary tables or variables. They are stored in the internal system tables and reflected in various system dynamic management views. But that changes for internal objects. They are only used by the internal logic of, in this case, the Hash Match operator. There is no advantage to storing them in the system tables. When I explored the internals of tables used by Table Spool, Index Spool, and Window Spool, I found out that Microsoft has indeed not bothered to put anything in the system tables for such internal objects. The Hash Match operator is not different in this regard.

But I still found a way to locate this information.

Hugo explains how, though it does include some contrivances to make life a lot easier. I always love this sort of spelunking deep into the bowels of how things work, and Hugo is definitely on the top shelf when it comes to this kind of work.

Comments closed

Category: Internals