Press "Enter" to skip to content

Category: Internals

Storage of XML Indexes in SQL Server

Hugo Kornelis continues a series on storage structures:

After covering on-disk rowstorecolumnstore indexesmemory-optimized storage, and memory-optimized columnstores, it is now time to turn our attention to storage structures that are used for specific datatypes only. The first “victim” will be the XML index.

When you need to store XML data in a SQL Server database, you can choose between two data types, each with their pros and cons. You can choose to store the XML data as nvarchar(max). This preserves the exact content of the XML as you received it, which might be required for legal reasons. However, any query that cares about the specific content of the XML data will have to resort to very complicated string expressions.

The xml data type shreds the XML and stores the content in an internal format, that allows SQL Server to work with for instance XQuery or XPath expressions. This format also saves space as compared to the nvarchar(max) alternative. However, when you query the data, the content will be the same, but formatting and whitespace might be different.

The latter also allow you to build indexes on top of them, and those indexes are what Hugo covers.

Leave a Comment

Log Records in SQL Server

Paul Randal explains what a log record is:

The simplest definition of a log record: it describes a single change to a database. A single operation in the database may cause multiple changes, but each change will usually have its own log record to describe it. An example of this is updating a column in a single row—it will do the following:

Read on to see what it will do, what it looks like, and what kinds of log records exist.

Leave a Comment

Comparative Query Analysis and Planning

Christophe Pettus has a two-parter. The first post covers how a half-dozen data platform technologies handle cost-based analysis:

PostgreSQL has ANALYZE. You run it (or autovacuum runs it for you), it draws a sample of 300 × default_statistics_target rows, and it writes a row per column into pg_statistic: a null fraction, an n-distinct estimate, a most-common-values list, an equi-depth histogram, and a physical-vs-logical correlation. The planner reads those numbers, multiplies selectivities together, costs a handful of join strategies, and picks one. Three join algorithms are on the menu: nested loop, merge join, hash join.

That is the entire shape of the problem, and every cost-based optimizer ever shipped solves the same one. They differ in three places, and only three: where the numbers come from, how stale the numbers are allowed to get, and which plan shapes are even legal to choose between. The algorithms are the boring part. Everybody hash-joins. The interesting part is the bookkeeping.

Then there’s how each of the systems generates a query plan:

Statistics are the input. Planning is what the database does with them: it takes a declarative query, which describes what you want and says nothing about how, and turns it into an executable plan, which is nothing but how. There are two jobs inside that. First, rewrite the query into a logically equivalent but more tractable shape, which is where subquery flattening, predicate pushdown, and view merging live. Second, search the space of physical plans (join orders, join algorithms, access paths) for the cheapest one the cost model can find. The second job is the hard one, because the number of possible join orders for a query grows faster than anyone wants to contemplate, and every database in this article is, underneath, a strategy for not enumerating all of them.

Two questions separate the six systems here. How does each one tame that search space? And once it has an answer, how much will it let you argue with the result? Those sound like the same question. They are not, and the most useful thing this comparison does is pull them apart. A database can search brilliantly and refuse you any override at all (Snowflake), search crudely and hand you a fistful of hints anyway (MySQL until recently), or search hard and expose every lever ever machined (Oracle). Sophistication of the search and generosity of the control surface are independent axes. Knowing where a system sits on each tells you most of what its planner feels like to live with.

Slightly odd is that there’s a section of DB2 but not on SQL Server. But it is a good cross-comparison of several of the top relational database options.

Leave a Comment

Bloom Filters with Valkey

Jay Miller checks for a record:

A bloom filter is a small, probabilistic data structure designed to answer one question: “Have I seen this item before?” It provides two potential answers: Absolutely Not, and Probably.

You may think that 100% Yes or No would be better but here’s the thing, probably is really fast and you’re really concerned about the Absolutely Not’s taking up unnecessary connections.

The article speaks to Aiven’s implementation of a Bloom filter in Valkey, but does get into some neat details on bloom filters in general. And if you want to go further down that route, Paul White explains how SQL Server uses Bloom filters.

Leave a Comment

The Pain of Moving Indexes between Filegroups

Erik Darling explains a process:

At some point you’re going to want to move some indexes to a new filegroup. Maybe you’re separating data across storage, maybe you’re cleaning up after someone who put everything on PRIMARY and walked away, maybe you’ve got your reasons and they’re none of my business.

Whatever the cause, you’d think this would be a solved problem in a database that’s been around since the Clinton administration.

It is not.

Some days, I’m convinced that the only way to win is not to play at all. Erik explaining how to migrate LOB data across filegroups fits that bill perfectly.

Leave a Comment

Postgres Performance on Linux 7.0 and Benchmark Artifacts

Christophe Pettus digs into a finding:

A benchmark came out of AWS earlier this month showing PostgreSQL throughput on Linux 7.0 dropping to 0.51x what the same workload produced on Linux 6.x. The Phoronix headline wrote itself. Hacker News did what Hacker News does. By the end of the week, I had been asked by three separate clients whether they needed to hold their kernel upgrades.

They don’t. Almost nobody does.

Read on for the answer, including a dive into the nature of the problem and a quick-and-easy fix for most cases.

Comments closed

What’s in a SQL Server File Header

Anthony Nocentino goes poking around:

I’ve been doing a deep dive into SQL Server on-disk structures lately, and one of my favorite rabbit holes is revisiting Paul Randal’s series on file header pages. If you haven’t read it, go do that now. It covers what file header pages are, what they contain, and what happens when they corrupt. This post takes that concept and runs with it. I’ll use DBCC FILEHEADER to read the file header of every user database file on a server and answer a question that comes up more than you’d think: can you determine which files belong together as a database purely from the file header, without querying sys.databases?

Read on for that answer, as well as what you cannot do with DBCC FILEHEADER.

Comments closed

A Primer on Data Storage in PostgreSQL

Grant Fritchey shares some thoughts:

The whole idea behind a database is the ability to persist the data. You want your inventory of widgets to get stored so you can look at it later. That means writing out to disks. However, what is writing to disk and where is it being written? Unlike SQL Server which has one (or more) big file for all data, PostgreSQL has a collection of a large number of files. There is a methodology and structure to these files that you need to understand in order to later understand how the data gets written to and retrieved from these files.

While we’re going to be very focused on file, page, folder, etc., throughout this article, that’s just part of the physical nature of persisting your data. What is being persisted is still the logical information you’re most interested in – rows and columns. I just wanted to emphasize the distinction between the two here.

Click through to see how PostgreSQL stores information.

Comments closed

The Makeup of an 8KB Page in PostgreSQL

Radim Marek takes us through the structure of a page:

If you read previous post about buffers, you already know PostgreSQL might not necessarily care about your rows. You might be inserting a user profile, or retrieving payment details, but all that Postgres works with are blocks of data. 8KB blocks, to be precise. You want to retrieve one tiny row? PostgreSQL hauls an entire 8,192-byte page off the disk just to give it to you. You update a single boolean flag? Same thing. The 8KB page is THE atomic unit of I/O.

But knowing those pages exist isn’t enough. To understand why the database behaves the way it does, you need to understand how it works. Every time you execute INSERT, PostgreSQL needs to figure out how to fit it into one of those 8,192-byte pages.

It is a little wild how three of the largest relational database systems use 8KB pages. I know that, on the SQL Server side, they’ve experimented with different page sizes internally and have repeatedly said that, even recently (the last time I heard this was maybe about 3 years ago at a SQL Saturday), there just isn’t a benefit from moving away from 8KB. But what’s in those 8KB differ, and Radim goes into details on what’s in PostgreSQL.

Comments closed

SQL Server Transaction Log Architecture

Paul Randal re-continues a series:

In the first part of this series I introduced basic terminology around logging so I recommend you read that before continuing with this post. Everything else I’ll cover in the series requires knowing some of the architecture of the transaction log, so that’s what I’m going to discuss this time. Even if you’re not going to follow the series, some of the concepts I’m going to explain below are worth knowing for everyday tasks that DBAs handle in production.

Note: as I progress through this series and talk about aspects of the log, there are often little edge-cases or weird behaviors in niche circumstances that have been added or changed over the years. I’ll ignore those unless I specifically want to call them out, otherwise the posts would be riddled with rat-holes and mazes of twisty-turny little passages (yes, I loved ’80s text-based adventure games 🙂 that would distract from the main things to learn about in each post.

Click through for a primer on virtual log files, log blocks, and log sequence numbers.

Comments closed