Press "Enter" to skip to content

Category: Architecture

Don’t Use FLOAT for Currency Data

Vlad Drumea has a public service announcement:

In this post I provide some examples as to why FLOAT isn’t the best option for storing exact financial data in SQL Server.

During the last couple of years I’ve seen quite a few cases of FLOAT being used to store financial data in SQL Server.
This isn’t ideal due to how the FLOAT data type behaves in SQL Server, and the official documentation makes a clear note of that.

I’m crotchety enough about this to say that you shouldn’t use any sort of decimal data type (DECIMAL, MONEY, etc.) when dealing with currency data in important financial contexts, either. I reach for BIGINT and divide by 100 (or 10,000 if you keep to four spots after the decimal) for the presentation layer.

For cases in which my data isn’t the source system of record for financial transactions, DECIMAL is fine. I don’t generally like MONEY as a data type in any circumstance, to be honest.

Leave a Comment

Defining a Data Contract

Buck Woody becomes accountable:

A businessperson pulls a report from a data warehouse, runs the same query they’ve used for two years, and gets a number that doesn’t match what the finance team presented at yesterday’s board meeting. Nobody changed the report. Nobody changed the dashboard. But somewhere upstream, an engineering team renamed a field, shifted a column type, or quietly altered the logic in a pipeline, and nobody thought to mention it because there was no mechanism to mention it.

While we think of this as an engineering failure, it’s more of an implied contract failure. More precisely, it’s the absence of a formal contract. Data contracts are one of the most practical tools a data organization can adopt, and one of the most underused. The idea is not complicated: a data contract is a formal, enforceable agreement between the team that produces data and the team that consumes it. It defines what the data looks like, what quality standards it must meet, who owns it, and what happens when something changes. Think of it as the API layer for your data, the same guarantee a software engineer expects from a well-documented endpoint, applied to the datasets and pipelines your business depends on. This post is about why that matters at the CDO level and how to get them put in place.

Click through to learn more about what data contracts are and why they are useful. This post stays at the architectural level rather than the practitioner level, but lays out why it’s important to think about these sorts of things.

Leave a Comment

The Death of SQL Server, Yet Again

Rebecca Lewis states that rumors of SQL Server’s demise have been greatly exaggerated:

LinkedIn is doing that thing again. ‘Is SQL Server Dead?’ posts are getting traction, the comment sections are lit, and I’m sure a lot of managers out there are forwarding one of them to a DBA with a question mark and no context. It happens every few years. NoSQL was going to kill it. The cloud was going to kill it. Now it’s AI and Fabric. I’ve been doing this for 28 years. Let me save you some time.

No. SQL Server is not dead or even dying. But something is changing — and we need to understand what.

Read on for Rebecca’s take. At the very least, I strongly agree with the general thrust of her argument. And, of course, I would be remiss if I didn’t mention Feasel’s Law here.

What makes this tricky is that there’s definitely more impetus from Microsoft to get people in Azure and using SQL Server-derived products up there. And the stability of the product has definitely taken a hit in the past couple of versions, something I chalk up to the organization removing QA teams back in about 2018. I do think the biggest competition for SQL Server is open-source relational databases, particularly PostgreSQL. At the same time, it’s not just a matter of “Let’s take a weekend and replace database management systems.” For that reason (among others), I do expect to see SQL Server around for quite a while.

And do check out Andy Yun’s comment. It links to a paper I’ve had on my to-read list that a Shop Talk viewer gave me while I was ranting on a somewhat similar topic.

Leave a Comment

Goodhart’s Law and KPIs

Alexander Arvidsson talks metrics:

In 1974, the Soviet Union set production quotas for nails measured in tons.

Factories responded rationally. They produced a small number of enormous, completely useless nails. Problem solved. Quota met.

Moscow reconsidered and switched to quotas measured in number of nails.

Factories responded rationally. They produced millions of tiny, flimsy nails—too small to hold anything together.

The Soviet nail industry was, by any measurable standard, thriving. It was also producing almost nothing of value.

Click through for a detailed description of the problem and Alexander’s push for an answer. I typically summarize Goodhart’s Law in my own words as, any system can be gamed. I think Alexander’s tips on good measurement architecture are sound (and he even mentions gaming systems here), but even with all of those in place, you can still end up with a monkey’s paw of a KPI suite. Even the example of “ensure that users who log in are getting measurable value from the product, as evidenced by X” falls apart in the end because we now maximize for X while hollowing out Y and Z, because nobody’s watching those, as we’re focused on X-maximization. Like, say, “ensure that users who long in are getting measurable value from the product, as evidenced by average order value.” Now we maximize for AOV, streamlining the experience for people making large orders and relentlessly pushing more cart items at the user. The other points aim to mitigate the worst of these results, but it’s an off-the-cuff example of how no system is perfect. But some can still be superior.

Leave a Comment

Page-Level Caching in PostgreSQL with PAX

Laetitia Avrot has a two-parter on PAX. The first part is from a couple of months ago and sets the stage:

Picture this: you walk into a library. Each book is a database record. PostgreSQL’s traditional storage (NSM – N-ary Storage Model) stores complete books on each shelf: chapter 1, chapter 2, chapter 3, all bound together.

Here’s the problem that keeps me up at night: when you need only chapter 3 from 1,000 books, you must pull each complete book off the shelf, flip through chapters 1 and 2 that you’ll never read, grab chapter 3, and move to the next book.

You’re wasting time. You’re wasting energy. You’re wasting cache bandwidth.

But it’s not all roses:

PAX looks elegant on paper: minipages, cache locality, column-oriented access inside an 8KB page. But the moment you ask how this actually would work with Postgres, the complexity arrives fast. NULL values, variable-length types, MVCC, boundary shifts. Let’s go through it all.

To be clear, this is not a product (today). It’s thinking about how to apply the results of an academic paper to Postgres and all of the tricky challenges that can come from what seems to be an easy idea.

Leave a Comment

Role-Playing Dimensions and Direct Lake Semantic Models

Chris Webb finds a workaround to something that used to work:

Back in September 2024 I wrote a blog post on how to create multiple copies of the same dimension in a Direct Lake semantic model without creating copies of the underlying Delta table. Not long after that I started getting comments that people who tried following my instructions were getting errors, and while some bugs were fixed others remained. After asking around I have a workaround (thank you Kevin Moore) that will avoid all those errors, so while we’re waiting for the remaining fixes here are the details of the workaround.

I look at the set of steps needed to do this and say there has to be a better way.

Leave a Comment

Microsoft Fabric ETL and the Air Traffic Controller

Jens Vestergaard rethinks a metaphor:

In February 2025 I wrote about building an event-driven ETL system in Microsoft Fabric. The metaphor was air traffic control: notebooks as flights, Azure Service Bus as the control tower, the Bronze/Silver/Gold medallion layers as the runway sequence. The whole system existed because Fabric has core-based execution limits that throttle how many Spark jobs run simultaneously on a given capacity SKU.

The post was about working around a constraint. You could not just fire all your notebooks at once. You needed something to manage the queue.

More than a year on, it is worth being honest about what held up and what has changed.

Read on to see what has changed in this past year and how Jens thinks of it today.

Leave a Comment

Read-Write Ratios in SQL Server Databases

Louis Davidson resurrects an old article:

When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)

I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.

Click through for the script. My expectations going in would be that OLTP and OLAP servers would actually have fairly similar read-write ratios. We talk about OLTP being “write-heavy” and OLAP being “read-heavy” but in practice, you need a lot of write behavior to sustain a warehouse—especially one that isn’t just a “truncate and reload once a day” type of thing—and people care about reading the data from their OLTP systems. In practice, I see the split as more of optimizing for accuracy in data entry (OLTP) versus simplicity of reading data (OLAP).

Leave a Comment

Building the Well-Architected Framework for Fabric

Joey D’Antoni applies Azure principles to Microsoft Fabric:

Let’s take a step back and talk about why I built this session. Like it or not, Microsoft’s intention with Fabric (and Power BI before it) is to make it easier for less-technical business users to build and consume data-driven reports. While I understand this mission, and it has been wildly successful in spreading love for Power BI, despite Fabric’s software-as-a-service branding, it’s actually a fully fledged data engine that needs to be well-managed to ensure data governance, security, and adherence to general best practices. In building my demos, I created a sample workspace with a couple of objects.

Click through for more notes on Joey’s talk, as well as a link to the code.

Leave a Comment

Exploring the Fabric Ontology

Jens Vestergaard takes a peek at Ontologies in Microsoft Fabric:

I have been spending a little time with the Microsoft Fabric data agent documentation lately, and one pattern keeps showing up, and it is not just in the official guidance but in community posts from people who have actually tried to deploy these things: the demo runs beautifully. The AI answers questions in plain English, leadership gets excited, the pilot gets approved. Then it hits production. Real users send real questions. The answers start drifting. Numbers that should match do not. The same question returns different results on different days. Trust evaporates faster than it was built.

And almost every time, the root cause is the same thing: the semantic foundation was not solid enough before anyone pointed an agent at it.

That is exactly the problem the Fabric Ontology is designed to address. It is the piece I think most teams will underestimate right up until the moment they need it.

Click through for an explanation. As I continue learning more about the concept of ontologies (not just in Fabric but in general), I’m slowly coming around to the idea. Though it still reminds me a lot of object-oriented programming with a no-code interface.

Comments closed