Architecture – Curated SQL

The Small Data Showdown in Microsoft Fabric

Published 2025-07-01 by Kevin Feasel

First, let’s revisit the purpose of the benchmark: The objective is to explore data engineering engines available in Fabric to understand whether Spark with vectorized execution (the Native Execution Engine) should be considered in small data architectures.

Beyond refreshing the benchmark to see if any core findings have changed, I do want to expand in a few areas where I got great feedback from the community:

I really appreciate the approach behind this, both in terms of sticking to more realistic data sizes for many operations as well as performing this test given all of the recent improvements in each engine.

Building Entity-Relationship Diagrams with DBeaver

Published 2025-06-26 by Kevin Feasel

Dave Stokes builds a diagram:

Even the most experienced database professionals are known to feel a little anxious when peering into an unfamiliar database. Hopefully, they inspect to see how the data is normalized and how the various tables are combined to answer complex queries. Entity Relationship Maps (ERM) provide a visual overview of how tables are related and can document the structure of the data.

Read on to see how you can do this with the DBeaver database access client.

Building an ML-Friendly Data Lake with Apache Iceberg

Published 2025-05-23 by Kevin Feasel

Anant Kumar designs a data lake:

As companies collect massive amounts of data to fuel their artificial intelligence and machine learning initiatives, finding the right data architecture for storing, managing, and accessing such data is crucial. Traditional data storage practices are likely to fall short to meet the scale, variety, and velocity required by modern AI/ML workflows. Apache Iceberg steps in as a strong open-source table format to build solid and efficient data lakes for AI and ML.

Click through for a primer on Iceberg, how to set up a fairly simple data lake, and some functionality that can help in model training.

Comments closed

Comparing Microsoft Fabric Engines

Published 2025-05-13 by Kevin Feasel

Nikola Ilic performs a comparison:

Before we proceed, an important disclaimer: the guidance I’m providing here is based on both my experience with implementing Microsoft Fabric in real-world scenarios, and the recommended practices provided by Microsoft.

Please keep in mind that the guidance relies on general recommended practices (I intentionally avoid using the phrase best practices, because the best is very hard to determine and agree on). The word general means that the practice I recommend should be used in most of the respective scenarios, but there will always be edge cases when the recommended practice is simply not the best solution. Therefore, you should always evaluate whether the general recommended practice makes sense in your specific use case.

Click through for a comparison between three engines: the lakehouse, the warehouse, and the eventhouse. It would really simplify things if the lakehouse and warehouse combined into one coherent whole.

Comments closed

Organizing a Microsoft Fabric Data Platform with Domains

Published 2025-05-07 by Kevin Feasel

Jon Vöge does a bit of organization:

A topic which seems more relevant than ever, is the question of how to organize the contents of your Microsoft Fabric Platform.

Through the contents of a few blogs, I will give you an overview of things to consider, as well as suggestions that you can choose from when designing your platform.

This first week, we’ll take a look at Domains in Microsoft Fabric.

Read on to understand why domains can be valuable and a solid way to structure them.

Comments closed

Choosing a Warehousing Data Architecture

Published 2025-05-07 by Kevin Feasel

James Serra compares and contrasts OLAP architectures:

As discussed in my blog and book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh” (Amazon), organizations are often challenged with choosing the right data architecture to meet their business goals—especially as AI and data-driven decision-making take center stage. To help clarify, here’s a quick review of the four core architectures, followed by guidance on when to use each. Each architecture includes five stages of data movement – ingest, store, transform, model, and visualize (described here).

Click through for James’s take on how each of them works and when you might choose one over the other.

Comments closed

Designing a Microsoft Fabric Workspace

Published 2025-04-22 by Kevin Feasel

Ron L’Esteve lays it out:

When planning a Data Platform for your organization on Microsoft Fabric, you need to consider workspaces during your design process. Proper workspace design is critical for the organization and consumption of Fabric items. Understanding how to effectively manage Microsoft Fabric workspaces can streamline your processes.

If your workspaces are not set up in a way that aligns with functional business verticals or environments (such as Dev/UAT/Prod), you will end up spending a significant amount of time and effort re-factoring this technical debt to meet the desired organizational structures. While it seems trivial to simply move a report, pipeline, or other workload from one workspace to another, the inbuilt dependencies can often be complex. With efficient planning and design efforts, these problems can be avoided.

Click through for Ron’s advice.

Comments closed

An Explanation of PostgreSQL’s Citus Extension

Published 2025-03-19 by Kevin Feasel

Craig Kerstiens covers a misunderstood extension:

Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that’s a mouthful), but it does so for very specific purposes.

Read on to learn when Citus can work well, when it isn’t a good fit, and a few architecture and design recommendations around using the extension.

Comments closed

Understanding Availability Zones in Azure

Published 2025-03-19 by Kevin Feasel

Mika Sutinen explains some of the nuance around Azure availability zones:

Azure Availability Zones help provide resiliency to your database services within an Azure Region. I simply love it how simple Microsoft has made building geographically dispersed database services. If you’ve ever designed and deployed multi-site, highly available database services in on-premises, you know what I am talking about.

However, with the Availability Zones in Azure, there are a couple of things to know. I’ve learned my lessons the hard way, so in this post I am providing some tools and guidance on how to avoid some pitfalls when building multi-zone database services.

Click through for that guidance.

Comments closed

ETL Orchestration and Air Traffic Control

Published 2025-03-11 by Kevin Feasel

Jens Vestergaard extends a metaphor:

We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)

That limitation is a bit annoying, but read on to see how Jens uses this metaphor to explain the various parts of an ETL orchestration engine.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Architecture