Category: Architecture

A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. The upstream component enforces the data contract, while the downstream component can assume that the data it receives conforms to the data contract. Data contracts are important because they provide transparency over dependencies and data usage in a streaming architecture. They help to ensure the consistency, reliability, and quality of the data in event streams, and they provide a single source of truth for understanding the data in motion.

Click through for a sample application that uses data contracts.

Comments closed

A Primer on Functional Programming

Published 2023-10-17 by Kevin Feasel

Anirban Shaw gives us the skinny:

In the ever-evolving landscape of software development, there exists a paradigm that has been gaining momentum and reshaping the way we approach coding challenges: functional programming.

In this article, we delve deep into the world of functional programming, exploring its advantages, core principles, origin, and reasons behind its growing traction.

I like this as an introduction to the topic, helping explain what functional programming languages are and why they’ve become much more interesting over the past 15-20 years. Anirban hits the topic of concurrency well, showing how a functional approach with immutable data makes it easy for multiple machines to work on separate parts of the problem independently and concurrently without error. I’d also add one more bit: functional programming languages tend to be more CPU-intensive than imperative languages, so in an era of strict computational scarcity, imperative languages dominate. With strides in computer processing, we tend to be CPU-bound less often, so the trade-off of some CPU for the benefits of FP makes a lot more sense. H/T R-Bloggers.

Comments closed

The Rise of Single-Purpose ML Frameworks

Published 2023-10-16 by Kevin Feasel

Pete Warden describes a phenomenon:

The GGML framework is just over a year old, but it has already changed the whole landscape of machine learning. Before GGML, an engineer wanting to run an existing ML model would start with a general purpose framework like PyTorch, find a data file containing the model architecture and weights, and then figure out the right sequence of calls to load and execute it. Today it’s much more likely that they will pick a model-specific code library like whisper.cpp or llama.cpp, based on GGML.

This isn’t the whole story though, because there are also popular model-specific libraries like llama2.cpp or llama.c that don’t use GGML, so this movement clearly isn’t based on the qualities of just one framework. The best term I’ve been able to come up with to describe these libraries is “disposable”. I know that might sound derogatory, but I don’t mean it like that, I actually think it’s the key to all their virtues! They’ve limited their scope to just a few models, focus on inference or fine-tuning rather than training from scratch, and overall try to do a few things very well. They’re not designed to last forever, as models change they’re likely to be replaced by newer versions, but they’re very good at what they do.

Pete calls them disposable ML frameworks, though I’d call them single-purpose frameworks to contrast with general-purpose ML frameworks like PyTorch and TensorFlow.

Comments closed

An Overview of Event-Driven Architecture

Published 2023-10-16 by Kevin Feasel

Yaniv Ben Hemo explains what event-driven architecture is:

First things first, Event-driven architecture. EDA and serverless functions are two powerful software patterns and concepts that have become popular in recent years with the rise of cloud-native computing. While one is more of an architecture pattern and the other a deployment or implementation detail, when combined, they provide a scalable and efficient solution for modern applications.

Click through for a primer on event-driven architecture. This is a pattern that I find quite useful for optimizing cloud pricing, assuming your normal business processes can run asynchronously—that is, people are not expecting near-real-time performance and you can start and stop processes periodically in order to “re-use” the same compute for multiple services. The alternative use of EDA is that your services need to be running all the time, but you also have multiple teams working together on the solution and you want to decouple team efforts. In that case, you define queues or Kafka-style topics and let those act as the mechanism for service integration.

This is definitely an architecture that works better for cloud-based systems than on-premises systems.

Comments closed

ORMs and Mapping Requirements

Published 2023-09-21 by Kevin Feasel

Mark Seemann is not a big fan of Entity Framework:

When I evaluate whether or not to use an ORM in situations like these, the core application logic is my main design driver. As I describe in Code That Fits in Your Head, I usually develop (vertical) feature slices one at a time, utilising an outside-in TDD process, during which I also figure out how to save or retrieve data from persistent storage.

Thus, in systems like these, storage implementation is an artefact of the software architecture. If a relational database is involved, the schema must adhere to the needs of the code; not the other way around.

To be clear, then, this article doesn’t discuss typical CRUD-heavy applications that are mostly forms over relational data, with little or no application logic. If you’re working with such a code base, an ORM might be useful. I can’t really tell, since I last worked with such systems at a time when ORMs didn’t exist.

Read on for a thoughtful argument. The only critique I have is I’d prefer stored procedures over saving SQL queries in the code.

1 Comment

The Medallion Architecture in Data Modeling

Published 2023-08-29 by Kevin Feasel

Nikola Ilic gets the gold:

The most common pattern for modeling the data in the lakehouse is called a medallion. I love this name – it’s really easy to remember. But, why medallion? Tag along and you’ll soon find out why.

The same as for the lakehouse concept, credits for being pioneers in the medallion approach goes to Databricks.

What I’ve found interesting is the number of people who have taken to disliking the medallion architecture terms because Databricks pushed it so hard that their clients automatically assumed “medallion = using Databricks.”

Comments closed

The Basics of Fact-Dimensional Modeling

Published 2023-07-21 by Kevin Feasel

Nikola Ilic gives us a primer on Kimball-style fact and dimensional modeling:

Before we come up to explain why dimensional modelling is named like that – dimensional, let’s first take a brief tour through some history lessons. In 1996, a man called Ralph Kimball published a book “The Data Warehouse Toolkit”, which is still considered a dimensional modelling “Bible”. In his book, Kimball introduced a completely new approach to modelling data for analytical workloads, the so-called “bottom-up” approach. The focus is on identifying key business processes within the organization and modelling these first, before introducing additional business processes.

This is a really good overview of the topic, though I’m saddened that “dimensional bus matrix” didn’t make the cut of things to discuss. Mostly because I like the name “dimensional bus matrix.”

Comments closed

Microsoft Fabric Architectural Icons

Published 2023-07-11 by Kevin Feasel

Marc Lelijveld imports some icons:

In the past, I’ve made a draw.io file for Power BI to help you using the right icons to design your solutions and make architectural diagrams. With Fabric, a bunch of new services and icons have been introduced. This asks for a new draw.io file.

With this blog, I will provide the draw.io file for all new icons and elements of Fabric.

Click through for that link. Also note that you might be more familiar with the new name of draw.io, diagrams.net.

Comments closed

Contrasting Kafka and Pulsar

Published 2023-05-25 by Kevin Feasel

Tessa Burk perform a comparson:

Apache Kafka® and Apache Pulsar™ are 2 popular message broker software options. Although they share certain similarities, there are big differences between them that impact their suitability for various projects.

In this comparison guide, we will explore the functionality of Kafka and Pulsar, explain the differences between the software, who would use them, and why.

Click through for that comparison. I haven’t used Pulsar before, so it’s interesting to get this sort of a functionality and community comparison.

Comments closed

Elastic Pools for Azure SQL DB Hyperscale

Published 2023-05-24 by Kevin Feasel

Arvind Shyamsundar announces a new preview:

We are very excited to announce the preview of elastic pools for Hyperscale service tier for Azure SQL Database!

For many years now, developers have selected the Hyperscale service tier in a “single database” resource model to power a wide variety of traditional and modern applications. Azure SQL Hyperscale is based on a cloud native architecture providing independently scalable compute and storage, and with limits which substantially exceed the resources available in the General Purpose and Business Critical tiers.

Click through to learn more about what’s on offer.

Comments closed