Press "Enter" to skip to content

Category: Architecture

The Death of SQL Server, Yet Again

Rebecca Lewis states that rumors of SQL Server’s demise have been greatly exaggerated:

LinkedIn is doing that thing again. ‘Is SQL Server Dead?’ posts are getting traction, the comment sections are lit, and I’m sure a lot of managers out there are forwarding one of them to a DBA with a question mark and no context. It happens every few years. NoSQL was going to kill it. The cloud was going to kill it. Now it’s AI and Fabric. I’ve been doing this for 28 years. Let me save you some time.

No. SQL Server is not dead or even dying. But something is changing — and we need to understand what.

Read on for Rebecca’s take. At the very least, I strongly agree with the general thrust of her argument. And, of course, I would be remiss if I didn’t mention Feasel’s Law here.

What makes this tricky is that there’s definitely more impetus from Microsoft to get people in Azure and using SQL Server-derived products up there. And the stability of the product has definitely taken a hit in the past couple of versions, something I chalk up to the organization removing QA teams back in about 2018. I do think the biggest competition for SQL Server is open-source relational databases, particularly PostgreSQL. At the same time, it’s not just a matter of “Let’s take a weekend and replace database management systems.” For that reason (among others), I do expect to see SQL Server around for quite a while.

And do check out Andy Yun’s comment. It links to a paper I’ve had on my to-read list that a Shop Talk viewer gave me while I was ranting on a somewhat similar topic.

Leave a Comment

Goodhart’s Law and KPIs

Alexander Arvidsson talks metrics:

In 1974, the Soviet Union set production quotas for nails measured in tons.

Factories responded rationally. They produced a small number of enormous, completely useless nails. Problem solved. Quota met.

Moscow reconsidered and switched to quotas measured in number of nails.

Factories responded rationally. They produced millions of tiny, flimsy nails—too small to hold anything together.

The Soviet nail industry was, by any measurable standard, thriving. It was also producing almost nothing of value.

Click through for a detailed description of the problem and Alexander’s push for an answer. I typically summarize Goodhart’s Law in my own words as, any system can be gamed. I think Alexander’s tips on good measurement architecture are sound (and he even mentions gaming systems here), but even with all of those in place, you can still end up with a monkey’s paw of a KPI suite. Even the example of “ensure that users who log in are getting measurable value from the product, as evidenced by X” falls apart in the end because we now maximize for X while hollowing out Y and Z, because nobody’s watching those, as we’re focused on X-maximization. Like, say, “ensure that users who long in are getting measurable value from the product, as evidenced by average order value.” Now we maximize for AOV, streamlining the experience for people making large orders and relentlessly pushing more cart items at the user. The other points aim to mitigate the worst of these results, but it’s an off-the-cuff example of how no system is perfect. But some can still be superior.

Leave a Comment

Page-Level Caching in PostgreSQL with PAX

Laetitia Avrot has a two-parter on PAX. The first part is from a couple of months ago and sets the stage:

Picture this: you walk into a library. Each book is a database record. PostgreSQL’s traditional storage (NSM – N-ary Storage Model) stores complete books on each shelf: chapter 1, chapter 2, chapter 3, all bound together.

Here’s the problem that keeps me up at night: when you need only chapter 3 from 1,000 books, you must pull each complete book off the shelf, flip through chapters 1 and 2 that you’ll never read, grab chapter 3, and move to the next book.

You’re wasting time. You’re wasting energy. You’re wasting cache bandwidth.

But it’s not all roses:

PAX looks elegant on paper: minipages, cache locality, column-oriented access inside an 8KB page. But the moment you ask how this actually would work with Postgres, the complexity arrives fast. NULL values, variable-length types, MVCC, boundary shifts. Let’s go through it all.

To be clear, this is not a product (today). It’s thinking about how to apply the results of an academic paper to Postgres and all of the tricky challenges that can come from what seems to be an easy idea.

Leave a Comment

Role-Playing Dimensions and Direct Lake Semantic Models

Chris Webb finds a workaround to something that used to work:

Back in September 2024 I wrote a blog post on how to create multiple copies of the same dimension in a Direct Lake semantic model without creating copies of the underlying Delta table. Not long after that I started getting comments that people who tried following my instructions were getting errors, and while some bugs were fixed others remained. After asking around I have a workaround (thank you Kevin Moore) that will avoid all those errors, so while we’re waiting for the remaining fixes here are the details of the workaround.

I look at the set of steps needed to do this and say there has to be a better way.

Leave a Comment

Microsoft Fabric ETL and the Air Traffic Controller

Jens Vestergaard rethinks a metaphor:

In February 2025 I wrote about building an event-driven ETL system in Microsoft Fabric. The metaphor was air traffic control: notebooks as flights, Azure Service Bus as the control tower, the Bronze/Silver/Gold medallion layers as the runway sequence. The whole system existed because Fabric has core-based execution limits that throttle how many Spark jobs run simultaneously on a given capacity SKU.

The post was about working around a constraint. You could not just fire all your notebooks at once. You needed something to manage the queue.

More than a year on, it is worth being honest about what held up and what has changed.

Read on to see what has changed in this past year and how Jens thinks of it today.

Leave a Comment

Read-Write Ratios in SQL Server Databases

Louis Davidson resurrects an old article:

When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)

I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.

Click through for the script. My expectations going in would be that OLTP and OLAP servers would actually have fairly similar read-write ratios. We talk about OLTP being “write-heavy” and OLAP being “read-heavy” but in practice, you need a lot of write behavior to sustain a warehouse—especially one that isn’t just a “truncate and reload once a day” type of thing—and people care about reading the data from their OLTP systems. In practice, I see the split as more of optimizing for accuracy in data entry (OLTP) versus simplicity of reading data (OLAP).

Leave a Comment

Building the Well-Architected Framework for Fabric

Joey D’Antoni applies Azure principles to Microsoft Fabric:

Let’s take a step back and talk about why I built this session. Like it or not, Microsoft’s intention with Fabric (and Power BI before it) is to make it easier for less-technical business users to build and consume data-driven reports. While I understand this mission, and it has been wildly successful in spreading love for Power BI, despite Fabric’s software-as-a-service branding, it’s actually a fully fledged data engine that needs to be well-managed to ensure data governance, security, and adherence to general best practices. In building my demos, I created a sample workspace with a couple of objects.

Click through for more notes on Joey’s talk, as well as a link to the code.

Leave a Comment

Exploring the Fabric Ontology

Jens Vestergaard takes a peek at Ontologies in Microsoft Fabric:

I have been spending a little time with the Microsoft Fabric data agent documentation lately, and one pattern keeps showing up, and it is not just in the official guidance but in community posts from people who have actually tried to deploy these things: the demo runs beautifully. The AI answers questions in plain English, leadership gets excited, the pilot gets approved. Then it hits production. Real users send real questions. The answers start drifting. Numbers that should match do not. The same question returns different results on different days. Trust evaporates faster than it was built.

And almost every time, the root cause is the same thing: the semantic foundation was not solid enough before anyone pointed an agent at it.

That is exactly the problem the Fabric Ontology is designed to address. It is the piece I think most teams will underestimate right up until the moment they need it.

Click through for an explanation. As I continue learning more about the concept of ontologies (not just in Fabric but in general), I’m slowly coming around to the idea. Though it still reminds me a lot of object-oriented programming with a no-code interface.

Leave a Comment

Thoughts on Database Keys and Constraints

Lee Asher digs into keys:

Keys come in two basic flavors: natural and surrogate. Natural keys, as their name suggests, are values that occur naturally in the data. They have real-world or business meaning: an email address, the street address of a building, the serial number of an inventory item, etc. In contrast, a surrogate key (sometimes called a synthetic key) is a value added to each row upon creation. A surrogate exists only within the confines of the database; it has no meaning outside of it.

A natural key often contains two or more columns. For instance, the key for a table of vehicle types might include the make, model, year, and trim level, all four columns of which must be combined to create the value that uniquely identifies each row. Surrogate keys are always a single column, though the value of each key may be anything you choose – as long as each value is distinct from all others.

One thing I would very strongly note here is that surrogate keys are a physical data model concept. I’m a firm believer that you almost always should have a surrogate key, but there must be a natural key, even if you don’t put an actual constraint on it. Though I do recommend having a unique key constraint on the natural key as well as a primary key constraint on the surrogate key.

Comments closed

The Power of Naming Standards

Louis Davidson covers naming standards:

I am starting a new, huge project soon, using some different tools than I have used before. So the first thing we need to do in this project is to come up with new standards for these new tools, right? No.

Understanding the overall scope and desired results, requirements are first, then a high-level design and architecture plan including what tools one expects to use.

Standards however, are something that you should be in place before you start creating stuff.

I like naming standards a lot. I like them so much, I have a half-dozen of them and use them all at once.

1 Comment