Architecture – Page 2

Designing a Microsoft Fabric Workspace

Published 2025-04-22 by Kevin Feasel

When planning a Data Platform for your organization on Microsoft Fabric, you need to consider workspaces during your design process. Proper workspace design is critical for the organization and consumption of Fabric items. Understanding how to effectively manage Microsoft Fabric workspaces can streamline your processes.

If your workspaces are not set up in a way that aligns with functional business verticals or environments (such as Dev/UAT/Prod), you will end up spending a significant amount of time and effort re-factoring this technical debt to meet the desired organizational structures. While it seems trivial to simply move a report, pipeline, or other workload from one workspace to another, the inbuilt dependencies can often be complex. With efficient planning and design efforts, these problems can be avoided.

Click through for Ron’s advice.

Comments closed

An Explanation of PostgreSQL’s Citus Extension

Published 2025-03-19 by Kevin Feasel

Craig Kerstiens covers a misunderstood extension:

Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that’s a mouthful), but it does so for very specific purposes.

Read on to learn when Citus can work well, when it isn’t a good fit, and a few architecture and design recommendations around using the extension.

Comments closed

Understanding Availability Zones in Azure

Published 2025-03-19 by Kevin Feasel

Mika Sutinen explains some of the nuance around Azure availability zones:

Azure Availability Zones help provide resiliency to your database services within an Azure Region. I simply love it how simple Microsoft has made building geographically dispersed database services. If you’ve ever designed and deployed multi-site, highly available database services in on-premises, you know what I am talking about.

However, with the Availability Zones in Azure, there are a couple of things to know. I’ve learned my lessons the hard way, so in this post I am providing some tools and guidance on how to avoid some pitfalls when building multi-zone database services.

Click through for that guidance.

Comments closed

ETL Orchestration and Air Traffic Control

Published 2025-03-11 by Kevin Feasel

Jens Vestergaard extends a metaphor:

We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)

That limitation is a bit annoying, but read on to see how Jens uses this metaphor to explain the various parts of an ETL orchestration engine.

Comments closed

Serving Databricks Models via API Management Endpoints

Published 2025-03-10 by Kevin Feasel

Drew Furgiuele makes available a model:

When it comes to generative AI projects I’d argue that the hardest and most tedious part has moved into a new area: hosting and serving your models. Whether you’re working with CPU intensive models, or models that require GPU horsepower, sourcing the hardware, building out deployment pipelines, configuring monitoring, and then securing everything is real, serious work that requires everyone to lean in to get it right.

And then, there’s the real question of how you’re going to use those models: will you be setting up automation and doing batch processing using your models and infrastructure? Or do you want to get really serious and offer up real-time inference? If the latter, you can add one more thing to solve for: managing your front-end APIs that you will have to build to support that use case.

Click through to see how you can use an API management tool (like Azure API Management) to assist in these things.

Comments closed

Multi-Measure Calculations in Relational Databases

Published 2025-03-05 by Kevin Feasel

Greg Low describes a common business problem:

But while food wholesale systems will need to deal with quantities like I described in that post, they often have another layer of complexity. Items are often sold by:

Quantity

Weight

Quantity and Weight

This is an interesting look at how the domain can drive what a proper solution looks like. It also seems like a good use case for 6th normal form, with unit quantity and unit weight tables to prevent NULL from cropping up.

Comments closed

Multi-Tenant Data Isolation Strategies

Published 2025-03-03 by Kevin Feasel

Rahul Miglani comes up with a list:

As organizations embrace cloud computing, multi-tenancy has become a popular architectural choice, enabling multiple customers (tenants) to share a single cloud environment. However, one of the biggest challenges in multi-tenancy is data isolation—ensuring that each tenant’s data remains private, secure, and accessible only to authorized users.

Microsoft Azure provides several data isolation strategies that allow businesses to securely manage and scale multi-tenant applications while ensuring compliance with regulatory standards like GDPR, HIPAA, and SOC 2.

In this blog, we will explore key data isolation strategies in multi-tenancy Azure architecture, their advantages, and best practices for implementation.

Reading through the list, the same set of options are available on-premises, though the calculus can be a bit different.

Comments closed

Thoughts on Scaling Elasticsearch

Published 2025-02-27 by Kevin Feasel

Vivek Kumar can’t stop at one:

With the evolution of modern applications serving increasing needs for real-time data processing and retrieval, scalability does, too. One such open-source, distributed search and analytics engine is Elasticsearch, which is very efficient at handling data in large sets and high-velocity queries. However, the process for effectively scaling Elasticsearch can be nuanced, since one needs a proper understanding of the architecture behind it and of performance tradeoffs.

Click through for those considerations and the trade-offs you might see.

Comments closed

Vertical Partitioning Rarely Works

Published 2025-02-26 by Kevin Feasel

Brent Ozar lays out an argument:

You’re looking at a wide table with 100-200 columns.

Years ago, it started as a “normal” table with maybe 10-20, but over the years, people kept gradually adding one column after another. Now, this behemoth is causing you problems because:

The table’s size on disk is huge

Queries are slow, especially when they do table scans

People are still asking for more columns, and you feel guilty saying yes

You’ve started to think about vertical partitioning: splitting the table up into one table with the commonly used columns, and another table with the rarely used columns. You figure you’ll only join to the rarely-used table when you need data from it.

Read on to understand why this is rarely a good idea and what you can do instead.

I will say that I’ve had success with vertical partitioning in very specific circumstances:

There are large columns, like blobs of JSON, binary data, or very large text strings.
There exists a subset of columns the application (or caller) rarely needs.
Those large columns are in the subset of columns the caller rarely needs, or can access via point lookup.

For a concrete example, my team at a prior company worked on a product that performed demand forecasting on approximately 10 million products across the customer base. For each product, we had the choice between using a common model (if the sales fit a common pattern) or generating a unique model. Because we were using SQL Server Machine Learning Services, we needed to store those custom models in the database. But each model, even when compressed, could run in the kilobytes to megabytes in size. We only needed to retrieve the model during training or inference, not reporting, but we did have reports that tracked what kind of model we were using, whether it was a standard model or custom (and if custom, what algorithm we used). Thus, we had the model binary in its own table, separate from the remaining model data.

Comments closed

Failover Groups in Azure SQL Database

Published 2025-02-21 by Kevin Feasel

Mika Sutinen looks at some interesting functionality:

One of the interesting features in Azure SQL Database is the Failover Groups. It allows you to manage replication of an Azure SQL database, or group of databases, to another logical server. The reason I’ve bolded the manage replication is, that the replication itself is handled by active geo-replication, which is also a feature of Azure SQL Database.

Read on to see how these are different and why you might want to use failover groups.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Architecture