Press "Enter" to skip to content

Author: Kevin Feasel

Page-Level Caching in PostgreSQL with PAX

Laetitia Avrot has a two-parter on PAX. The first part is from a couple of months ago and sets the stage:

Picture this: you walk into a library. Each book is a database record. PostgreSQL’s traditional storage (NSM – N-ary Storage Model) stores complete books on each shelf: chapter 1, chapter 2, chapter 3, all bound together.

Here’s the problem that keeps me up at night: when you need only chapter 3 from 1,000 books, you must pull each complete book off the shelf, flip through chapters 1 and 2 that you’ll never read, grab chapter 3, and move to the next book.

You’re wasting time. You’re wasting energy. You’re wasting cache bandwidth.

But it’s not all roses:

PAX looks elegant on paper: minipages, cache locality, column-oriented access inside an 8KB page. But the moment you ask how this actually would work with Postgres, the complexity arrives fast. NULL values, variable-length types, MVCC, boundary shifts. Let’s go through it all.

To be clear, this is not a product (today). It’s thinking about how to apply the results of an academic paper to Postgres and all of the tricky challenges that can come from what seems to be an easy idea.

Comments closed

ANY_VALUE() in Fabric Data Warehouse

Jovan Popovic notes a feature going GA:

Fabric Data Warehouse now supports the ANY_VALUE() aggregate, making it easier to write readable, efficient T-SQL when you want to group by a key but still return descriptive columns that are functionally the same for every row in the group.

Right now, this is only available in the Fabric Data Warehouse, so no Azure SQL DB, Managed Instance, or box product support at this time.

Comments closed

Privilege Escalation via the DatabaseManager Role

Andreas Wolter looks into a security issue:

First of all: full credit to Emad Al-Mousa for identifying and publishing two privilege escalation paths.

His findings show how members of the ##MS_DatabaseManager## server-role in SQL Server can escalate to sysadmin by chaining existing functionality in unexpected ways.

He has published his findings responsibly after more than 8 months’ notice to MSRC: Microsoft SQL Server Privilege Elevation Through ##MS_DatabaseManager## Role [CVE-2025-24999]

Read on to learn more about the issue.

Microsoft fixed a separate privilege escalation issue but has updated the documentation for ##MS_DatabaseManager## to indicate that “Members of this role can potentially elevate their privileges under certain conditions. Apply the principle of least privilege when assigning this role and monitor all activity performed by its members.”

Comments closed

Word Order and Constraint Naming

Andy Levy is looking for a name:

Ten years (and a couple jobs) ago, I wrote about naming default constraints to avoid having SQL Server name them for you. I closed with the following statement:

SQL Server needs a name for the constraint regardless; it’s worth specifying it yourself.

We’re back with a new wrinkle in the story.

Read on for an interesting scenario where Andy very clearly named a constraint, yet the name didn’t take.

Comments closed

Updating a Mean without Original Data Points

John Cook has an interesting solution:

This post will look at the problem of updating an average grade as a very simple special case of Bayesian statistics and of Kalman filtering.

Suppose you’re keeping up with your average grade in a class, and you know your average after n tests, all weighted equally.

Click through for the walkthrough. This is similar to something I tried to puzzle out but ultimately admitted defeat: is there a way to calculate updates to the median without needing to know the entire set? In practical terms, this would be something like, how many pieces of information do I need to guarantee that I can maintain a median over time?

The best I could come up with was built along the premise of the likelihood of new data points being less than the median versus those greater than the median, where each pair of greater-lesser cancel each other out. If you have roughly equal numbers of new data points to each side, your “elements of the median” array can be pretty small. But the problem is, for any sufficiently small k, where k represents the number of elements you keep in memory, it is possible for a localized collection of (without loss of generality) lower-than-median data points to come in and completely wash out your memory. For example, if you kept 3 points and memory and you have four values below the median, you no longer know what the median is.

Trying to solve this without knowing the shape of the distribution or make any sequencing assumptions is something that I failed to do.

Comments closed

Tracking Resource Utilization in Performance Monitor

Erik Darling has another video tutorial:

In this video, I delve into the resource metrics tabs of the full dashboard from my free SQL Server performance monitoring tool. I explain how these tabs provide more detailed information compared to the lighter version, including CPU utilization, TempDB usage, memory usage, and a selection of perfmon counters. The goal is to help you quickly identify when your server might be experiencing issues by showing real-time data and historical trends that can pinpoint specific performance bottlenecks. I also highlight how the weight stats section helps in understanding wait statistics, making it easier to diagnose problems related to resource contention and query compilation.

Click through for the video.

Comments closed

Role-Playing Dimensions and Direct Lake Semantic Models

Chris Webb finds a workaround to something that used to work:

Back in September 2024 I wrote a blog post on how to create multiple copies of the same dimension in a Direct Lake semantic model without creating copies of the underlying Delta table. Not long after that I started getting comments that people who tried following my instructions were getting errors, and while some bugs were fixed others remained. After asking around I have a workaround (thank you Kevin Moore) that will avoid all those errors, so while we’re waiting for the remaining fixes here are the details of the workaround.

I look at the set of steps needed to do this and say there has to be a better way.

Comments closed

Automatic Index Compaction

Rebecca Lewis looks at a new Azure SQL Database preview feature:

Microsoft’s announcement of Automatic Index Compaction is titled ‘Stop defragmenting and start living’. That is not an accident. Brent Ozar has been making the case for years that defragmenting indexes is largely maintenance theater — that external fragmentation barely matters on modern SSDs and shared storage and that nightly rebuild jobs hammer your transaction log and I/O for gains that are difficult to measure.

His sessions on the topic have been circulating for over a decade, and now Microsoft’s own documentation states it plainly: ‘For most workloads, a higher index fragmentation doesn’t affect query performance or resource consumption.’ I believe that may be Brent’s argument almost verbatim in their official docs.

This could be interesting.

By the way, if you want a really deep dive on index maintenance, I’ll point back to a pair of sessions Jeff Moden did for TriPASS (the Triangle Area SQL Server User Group that I run) about 5 years back and was gracious enough to let us record. They are very long user group sessions but go into detail on exactly what kinds of index write patterns benefit from rebuilds and which ones don’t, as well as a lot more.

Comments closed