Press "Enter" to skip to content

Category: Architecture

Microsoft Fabric ETL and the Air Traffic Controller

Jens Vestergaard rethinks a metaphor:

In February 2025 I wrote about building an event-driven ETL system in Microsoft Fabric. The metaphor was air traffic control: notebooks as flights, Azure Service Bus as the control tower, the Bronze/Silver/Gold medallion layers as the runway sequence. The whole system existed because Fabric has core-based execution limits that throttle how many Spark jobs run simultaneously on a given capacity SKU.

The post was about working around a constraint. You could not just fire all your notebooks at once. You needed something to manage the queue.

More than a year on, it is worth being honest about what held up and what has changed.

Read on to see what has changed in this past year and how Jens thinks of it today.

Leave a Comment

Read-Write Ratios in SQL Server Databases

Louis Davidson resurrects an old article:

When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)

I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.

Click through for the script. My expectations going in would be that OLTP and OLAP servers would actually have fairly similar read-write ratios. We talk about OLTP being “write-heavy” and OLAP being “read-heavy” but in practice, you need a lot of write behavior to sustain a warehouse—especially one that isn’t just a “truncate and reload once a day” type of thing—and people care about reading the data from their OLTP systems. In practice, I see the split as more of optimizing for accuracy in data entry (OLTP) versus simplicity of reading data (OLAP).

Leave a Comment

Building the Well-Architected Framework for Fabric

Joey D’Antoni applies Azure principles to Microsoft Fabric:

Let’s take a step back and talk about why I built this session. Like it or not, Microsoft’s intention with Fabric (and Power BI before it) is to make it easier for less-technical business users to build and consume data-driven reports. While I understand this mission, and it has been wildly successful in spreading love for Power BI, despite Fabric’s software-as-a-service branding, it’s actually a fully fledged data engine that needs to be well-managed to ensure data governance, security, and adherence to general best practices. In building my demos, I created a sample workspace with a couple of objects.

Click through for more notes on Joey’s talk, as well as a link to the code.

Leave a Comment

Exploring the Fabric Ontology

Jens Vestergaard takes a peek at Ontologies in Microsoft Fabric:

I have been spending a little time with the Microsoft Fabric data agent documentation lately, and one pattern keeps showing up, and it is not just in the official guidance but in community posts from people who have actually tried to deploy these things: the demo runs beautifully. The AI answers questions in plain English, leadership gets excited, the pilot gets approved. Then it hits production. Real users send real questions. The answers start drifting. Numbers that should match do not. The same question returns different results on different days. Trust evaporates faster than it was built.

And almost every time, the root cause is the same thing: the semantic foundation was not solid enough before anyone pointed an agent at it.

That is exactly the problem the Fabric Ontology is designed to address. It is the piece I think most teams will underestimate right up until the moment they need it.

Click through for an explanation. As I continue learning more about the concept of ontologies (not just in Fabric but in general), I’m slowly coming around to the idea. Though it still reminds me a lot of object-oriented programming with a no-code interface.

Leave a Comment

Thoughts on Database Keys and Constraints

Lee Asher digs into keys:

Keys come in two basic flavors: natural and surrogate. Natural keys, as their name suggests, are values that occur naturally in the data. They have real-world or business meaning: an email address, the street address of a building, the serial number of an inventory item, etc. In contrast, a surrogate key (sometimes called a synthetic key) is a value added to each row upon creation. A surrogate exists only within the confines of the database; it has no meaning outside of it.

A natural key often contains two or more columns. For instance, the key for a table of vehicle types might include the make, model, year, and trim level, all four columns of which must be combined to create the value that uniquely identifies each row. Surrogate keys are always a single column, though the value of each key may be anything you choose – as long as each value is distinct from all others.

One thing I would very strongly note here is that surrogate keys are a physical data model concept. I’m a firm believer that you almost always should have a surrogate key, but there must be a natural key, even if you don’t put an actual constraint on it. Though I do recommend having a unique key constraint on the natural key as well as a primary key constraint on the surrogate key.

Comments closed

The Power of Naming Standards

Louis Davidson covers naming standards:

I am starting a new, huge project soon, using some different tools than I have used before. So the first thing we need to do in this project is to come up with new standards for these new tools, right? No.

Understanding the overall scope and desired results, requirements are first, then a high-level design and architecture plan including what tools one expects to use.

Standards however, are something that you should be in place before you start creating stuff.

I like naming standards a lot. I like them so much, I have a half-dozen of them and use them all at once.

1 Comment

A Primer on Microsoft Fabric Deployment Pipelines

Hamish Watson asks a question:

In the realm of software development and content creation, the deployment pipeline serves as a crucial bridge between innovation and implementation. Whether you are fine-tuning code, testing new features, or releasing a polished product to end-users, the deployment pipeline guides your content through distinct stages, each playing a vital role in ensuring a smooth and efficient journey from development to production.

Read on for a high-level overview of deployment pipeline structure and methods.

Comments closed

An Edge Case with Row Locks and Joins in PostgreSQL

Haki Benita digs into a problem:

Here’s a database riddle: you have two tables with data connected by a foreign key. The foreign key field is set as not null and the constraint is valid and enforced. You execute a query that joins these two tables and you get no results! How is that possible? We thought it wasn’t possible, but a recent incident revealed an edge case we never thought about.

If you answered “READ COMMITTED isolation level is weird,” congratulations: you win a prize.

Comments closed

Architecting Your First Microservice

Bijoy Choudhury builds a process:

In any microservices migration, extracting services from all their dependencies and point-to-point integrations carries the most risk. If you feel hesitant about decomposing your application, that hesitation is justified. The first service extraction is uniquely challenging because you have to examine years of accumulated technical debt and unresolved organizational decisions at the same time. 

That’s why the objective for the first service extraction should not focus on achieving immediate scalability or to redefine organizational practices but to validate a narrow capability. Instead, it’s about identifying a discrete unit of functionality that can be isolated, deployed independently, and integrated with the existing system without rewriting the entire system or introducing instability.

There’s some good advice in here, as well as one reason why I’m not totally sold on microservices: the isolation of databases. This sounds great until you’re hitting seven different services to retrieve data 100x slower than a simple SQL query would have been because you have complex filtering criteria across these seven services. And then you build an extra layer of caching, introducing even more complexity to solve a problem that never needed to exist.

Comments closed

The Challenge of Many-to-Many Relationships in Power BI

Ben Richardson explains a common anti-pattern in Power BI semantic models:

Relationships sit at the heart of literally everything you do in Power BI.

Before you make measures, visuals and reports, relationships are established to define how your data fits together. Their job is simple on the surface – but vital: describe how each table is connected.

If you can design these relationships well, everything else will run much smoother.

Across any data domain, strong models rely on clear Grain, correct Cardinality, and a Star Schema built with well-defined Fact and Dimension tables.

Read on to understand how many-to-many relationships stress this understanding in Power BI an different techniques for dealing with those sorts of relationships.

Comments closed