Press "Enter" to skip to content

Category: Internals

Dealing with Dirty Pages in PostgreSQL

Umair Shahid explains what dirty pages are in PostgreSQL:

PostgreSQL stores data in fixed‑size blocks (pages), normally 8 KB. When a client updates or inserts data, PostgreSQL does not immediately write those changes to disk. Instead, it loads the affected page into shared memory (shared buffers), makes the modification there, and marks the page as dirty. A “dirty page” means the version of that page in memory is newer than the on‑disk copy.

Before returning control to the client, PostgreSQL records the change in the Write‑Ahead Log (WAL), ensuring durability even if the database crashes. However, the actual table file isn’t updated until a checkpoint or background writer flushes the dirty page to disk.

The concept is the same in SQL Server. Read on to see how PostgreSQL manages dirty pages and some of the issues you might run into with them.

Leave a Comment

Local Variables and Cardinality Estimates in SQL Server

Erik Darling lays out an explanation of what SQL Server does to estimate the cardinality of queries using local variables. Erik covers some of the conceptual questions around how a compiler could work, and then explains what SQL Server does.

Erik talks about dependency on specific behavior and how people get upset when things change, and he’s absolutely right. My favorite example of this is the “quirky update” technique people used back in the day prior to SQL Server 2012 to calculate running totals. That technique was something that relied on unsupported, accidental but repeatable behavior around creating clustered indexes on temp tables. Even though the SQL Server team explicitly stated that this behavior could change at any time, it was clear that they didn’t go out of their way to break that functionality.

Leave a Comment

Microsoft Fabric Direct Lake Join Index Creation

Phil Seamark explains a recent change:

If you’ve been working with Direct Lake in Microsoft Fabric, you’ll know its magic resides in its ability to quickly load data. It loads data into semantic models from OneLake when needed. This feature eliminates the overhead of importing. But until recently, the first query on a cold cache might feel sluggish. Why? One reason for this is that Direct Lake must build a join index. This index is added to the model during the first query. This index is a critical structure that maps relationships between tables for efficient lookups.

Earlier, this process was single-threaded and slow, especially on large tables with high cardinality. The good news? That’s changed.

Read on to see how, what a join index is, and what this impact looks like in practice.

Leave a Comment

Inspecting the Postgres Write-Ahead Log

Henrietta Dombrovskaya digs into the write-ahead log:

First, when the users fixed one of the primary suspects jobs, the situation with WAL growth didn’t change. Second, the rate of the growth couldn’t be explained by these suboptimal jobs: the data volumes they were removing and reloading were still magnitudes smaller than the WAL size we were dealing with. Finally, I decided to do what I should have done from the start – to take a look at what exactly was in these super-fast growing WALs.

Read on to learn what Henrietta found. Also check out the comments for some additional context.

Comments closed

Generating and Recognizing Hash Collisions in SQL Server

Hugo Kornelis continues a deep dive into hash tables in SQL Server:

Welcome back! In the previous parts, I first showed how a Hash Match (Left Outer Join) can give insight in the order of data in a hash table, and then used that trick to obtain and verify some interesting insights into the internal structure of such a table. It’s now time to see if this same trick can also be used to find hash collisions.

Hugo finds some interesting results along the way, so check it out.

Comments closed

SLRUs and MultiXacts in Postgres

Andrew Atkinson digs in:

The main purpose of SLRUs is to track metadata about Postgres transactions.

SLRUs are a general mechanism used by multiple types. Like a lot of things in Postgres, the SLRU system is extensible which means extensions can create new types.

The “least recently used” aspect might be recognizable from cache systems. LRU refers to how the oldest items are evicted from the cache when it’s full, and newer items take their place. This is because the cache has a fixed amount of space (measured in 8KB pages) and thus can only store a fixed amount of items.

Read on to learn more about these two concepts and how things have changed in Postgres 17.

Comments closed

The Internals of a Hash Table

Hugo Kornelis digs deep:

In part 1 of this series, I laid the foundation to explore the structure of the hash table, as used by the Hash Match operator, by alleging and then proving that a Hash Match (Left Outer Join) returns unmatched rows from the build input in the order in which they are stored in the hash table. This means that we can create queries on carefully curated data to gain insight in the structure of that hash table.

It is now time to use that trick to actually start to explore the hash table. But not without also looking at available documentation and common sense.

Click through for a waltz down memory lane, a graphical interpretation of a hash table, and some tests to see if Hugo is correct.

Comments closed

Join Operator Properties and Query Performance

Andy Brownsword takes a closer look at the big three join operators in SQL Server:

In the previous post I explained Join Operators in SQL Server. Whilst compiling that I dug a little deeper and came across a few interesting points I thought were worth sharing.

Let’s look at behaviour of the operators which may occur under specific conditions. Hopefully you find them as interesting as I did:

Click through for an interesting tidbit about nested loops joins, merge joins, and hash joins.

Comments closed

A Primer on Join Operators

Andy Brownsword takes a peek at the three most common types of join operators, plus a bonus:

When reviewing our execution plans we’ll see joins executed using different operators. The type of operator is chosen based on the data that’s available to join and how the optimiser wants to execute it.

In this post we’ll take a look at what the operators are, when they are used, and how they work. These are the operators we’ll cover:

  • Nested Loop Joins
  • Merge Joins
  • Hash Match Joins
  • (Bonus) Adaptive Joins

Read on for a quick overview of which works best when.

Comments closed