Press "Enter" to skip to content

Category: Internals

An Overview of PostgreSQL Internals

Elizabeth Christensen shows some of the ways to view internal information in PostgreSQL:

Postgres has an awesome amount of data collected in its own internal tables. Postgres hackers know all about this  – but software developers and folks working with day to day Postgres tasks often miss out the good stuff.

The Postgres catalog is how Postgres keeps track of itself. Of course, Postgres would do this in a relational database with its own schema. Throughout the years several nice features have been added to the internal tables like psql tools and views that make navigating Postgres’ internal tables even easier.

Today I want to walk through some of the most important Postgres internal data catalog details. What they are, what is in them, and how they might help you understand more about what is happening inside your database.

Click through for an overview of catalog tables and catalog views (similar to SQL Server’s system tables and Dynamic Management Views, respectively).

Leave a Comment

Dealing with Dirty Pages in PostgreSQL

Umair Shahid explains what dirty pages are in PostgreSQL:

PostgreSQL stores data in fixed‑size blocks (pages), normally 8 KB. When a client updates or inserts data, PostgreSQL does not immediately write those changes to disk. Instead, it loads the affected page into shared memory (shared buffers), makes the modification there, and marks the page as dirty. A “dirty page” means the version of that page in memory is newer than the on‑disk copy.

Before returning control to the client, PostgreSQL records the change in the Write‑Ahead Log (WAL), ensuring durability even if the database crashes. However, the actual table file isn’t updated until a checkpoint or background writer flushes the dirty page to disk.

The concept is the same in SQL Server. Read on to see how PostgreSQL manages dirty pages and some of the issues you might run into with them.

Leave a Comment

Local Variables and Cardinality Estimates in SQL Server

Erik Darling lays out an explanation of what SQL Server does to estimate the cardinality of queries using local variables. Erik covers some of the conceptual questions around how a compiler could work, and then explains what SQL Server does.

Erik talks about dependency on specific behavior and how people get upset when things change, and he’s absolutely right. My favorite example of this is the “quirky update” technique people used back in the day prior to SQL Server 2012 to calculate running totals. That technique was something that relied on unsupported, accidental but repeatable behavior around creating clustered indexes on temp tables. Even though the SQL Server team explicitly stated that this behavior could change at any time, it was clear that they didn’t go out of their way to break that functionality.

Leave a Comment

Microsoft Fabric Direct Lake Join Index Creation

Phil Seamark explains a recent change:

If you’ve been working with Direct Lake in Microsoft Fabric, you’ll know its magic resides in its ability to quickly load data. It loads data into semantic models from OneLake when needed. This feature eliminates the overhead of importing. But until recently, the first query on a cold cache might feel sluggish. Why? One reason for this is that Direct Lake must build a join index. This index is added to the model during the first query. This index is a critical structure that maps relationships between tables for efficient lookups.

Earlier, this process was single-threaded and slow, especially on large tables with high cardinality. The good news? That’s changed.

Read on to see how, what a join index is, and what this impact looks like in practice.

Leave a Comment

Inspecting the Postgres Write-Ahead Log

Henrietta Dombrovskaya digs into the write-ahead log:

First, when the users fixed one of the primary suspects jobs, the situation with WAL growth didn’t change. Second, the rate of the growth couldn’t be explained by these suboptimal jobs: the data volumes they were removing and reloading were still magnitudes smaller than the WAL size we were dealing with. Finally, I decided to do what I should have done from the start – to take a look at what exactly was in these super-fast growing WALs.

Read on to learn what Henrietta found. Also check out the comments for some additional context.

Comments closed

Generating and Recognizing Hash Collisions in SQL Server

Hugo Kornelis continues a deep dive into hash tables in SQL Server:

Welcome back! In the previous parts, I first showed how a Hash Match (Left Outer Join) can give insight in the order of data in a hash table, and then used that trick to obtain and verify some interesting insights into the internal structure of such a table. It’s now time to see if this same trick can also be used to find hash collisions.

Hugo finds some interesting results along the way, so check it out.

Comments closed

SLRUs and MultiXacts in Postgres

Andrew Atkinson digs in:

The main purpose of SLRUs is to track metadata about Postgres transactions.

SLRUs are a general mechanism used by multiple types. Like a lot of things in Postgres, the SLRU system is extensible which means extensions can create new types.

The “least recently used” aspect might be recognizable from cache systems. LRU refers to how the oldest items are evicted from the cache when it’s full, and newer items take their place. This is because the cache has a fixed amount of space (measured in 8KB pages) and thus can only store a fixed amount of items.

Read on to learn more about these two concepts and how things have changed in Postgres 17.

Comments closed

The Internals of a Hash Table

Hugo Kornelis digs deep:

In part 1 of this series, I laid the foundation to explore the structure of the hash table, as used by the Hash Match operator, by alleging and then proving that a Hash Match (Left Outer Join) returns unmatched rows from the build input in the order in which they are stored in the hash table. This means that we can create queries on carefully curated data to gain insight in the structure of that hash table.

It is now time to use that trick to actually start to explore the hash table. But not without also looking at available documentation and common sense.

Click through for a waltz down memory lane, a graphical interpretation of a hash table, and some tests to see if Hugo is correct.

Comments closed

Join Operator Properties and Query Performance

Andy Brownsword takes a closer look at the big three join operators in SQL Server:

In the previous post I explained Join Operators in SQL Server. Whilst compiling that I dug a little deeper and came across a few interesting points I thought were worth sharing.

Let’s look at behaviour of the operators which may occur under specific conditions. Hopefully you find them as interesting as I did:

Click through for an interesting tidbit about nested loops joins, merge joins, and hash joins.

Comments closed