Press "Enter" to skip to content

Category: Architecture

The Rise of Single-Purpose ML Frameworks

Pete Warden describes a phenomenon:

The GGML framework is just over a year old, but it has already changed the whole landscape of machine learning. Before GGML, an engineer wanting to run an existing ML model would start with a general purpose framework like PyTorch, find a data file containing the model architecture and weights, and then figure out the right sequence of calls to load and execute it. Today it’s much more likely that they will pick a model-specific code library like whisper.cpp or llama.cpp, based on GGML.

This isn’t the whole story though, because there are also popular model-specific libraries like llama2.cpp or llama.c that don’t use GGML, so this movement clearly isn’t based on the qualities of just one framework. The best term I’ve been able to come up with to describe these libraries is “disposable”. I know that might sound derogatory, but I don’t mean it like that, I actually think it’s the key to all their virtues! They’ve limited their scope to just a few models, focus on inference or fine-tuning rather than training from scratch, and overall try to do a few things very well. They’re not designed to last forever, as models change they’re likely to be replaced by newer versions, but they’re very good at what they do.

Pete calls them disposable ML frameworks, though I’d call them single-purpose frameworks to contrast with general-purpose ML frameworks like PyTorch and TensorFlow.

Comments closed

An Overview of Event-Driven Architecture

Yaniv Ben Hemo explains what event-driven architecture is:

First things first, Event-driven architecture. EDA and serverless functions are two powerful software patterns and concepts that have become popular in recent years with the rise of cloud-native computing. While one is more of an architecture pattern and the other a deployment or implementation detail, when combined, they provide a scalable and efficient solution for modern applications.

Click through for a primer on event-driven architecture. This is a pattern that I find quite useful for optimizing cloud pricing, assuming your normal business processes can run asynchronously—that is, people are not expecting near-real-time performance and you can start and stop processes periodically in order to “re-use” the same compute for multiple services. The alternative use of EDA is that your services need to be running all the time, but you also have multiple teams working together on the solution and you want to decouple team efforts. In that case, you define queues or Kafka-style topics and let those act as the mechanism for service integration.

This is definitely an architecture that works better for cloud-based systems than on-premises systems.

Comments closed

ORMs and Mapping Requirements

Mark Seemann is not a big fan of Entity Framework:

When I evaluate whether or not to use an ORM in situations like these, the core application logic is my main design driver. As I describe in Code That Fits in Your Head, I usually develop (vertical) feature slices one at a time, utilising an outside-in TDD process, during which I also figure out how to save or retrieve data from persistent storage.

Thus, in systems like these, storage implementation is an artefact of the software architecture. If a relational database is involved, the schema must adhere to the needs of the code; not the other way around.

To be clear, then, this article doesn’t discuss typical CRUD-heavy applications that are mostly forms over relational data, with little or no application logic. If you’re working with such a code base, an ORM might be useful. I can’t really tell, since I last worked with such systems at a time when ORMs didn’t exist.

Read on for a thoughtful argument. The only critique I have is I’d prefer stored procedures over saving SQL queries in the code.

1 Comment

The Medallion Architecture in Data Modeling

Nikola Ilic gets the gold:

The most common pattern for modeling the data in the lakehouse is called a medallion. I love this name – it’s really easy to remember. But, why medallion? Tag along and you’ll soon find out why.

The same as for the lakehouse concept, credits for being pioneers in the medallion approach goes to Databricks.

What I’ve found interesting is the number of people who have taken to disliking the medallion architecture terms because Databricks pushed it so hard that their clients automatically assumed “medallion = using Databricks.”

Comments closed

The Basics of Fact-Dimensional Modeling

Nikola Ilic gives us a primer on Kimball-style fact and dimensional modeling:

Before we come up to explain why dimensional modelling is named like that – dimensional, let’s first take a brief tour through some history lessons. In 1996, a man called Ralph Kimball published a book “The Data Warehouse Toolkit”, which is still considered a dimensional modelling “Bible”. In his book, Kimball introduced a completely new approach to modelling data for analytical workloads, the so-called “bottom-up” approach. The focus is on identifying key business processes within the organization and modelling these first, before introducing additional business processes.

This is a really good overview of the topic, though I’m saddened that “dimensional bus matrix” didn’t make the cut of things to discuss. Mostly because I like the name “dimensional bus matrix.”

Comments closed

Microsoft Fabric Architectural Icons

Marc Lelijveld imports some icons:

In the past, I’ve made a draw.io file for Power BI to help you using the right icons to design your solutions and make architectural diagrams. With Fabric, a bunch of new services and icons have been introduced. This asks for a new draw.io file.

With this blog, I will provide the draw.io file for all new icons and elements of Fabric.

Click through for that link. Also note that you might be more familiar with the new name of draw.io, diagrams.net.

Comments closed

Contrasting Kafka and Pulsar

Tessa Burk perform a comparson:

Apache Kafka® and Apache Pulsar™ are 2 popular message broker software options. Although they share certain similarities, there are big differences between them that impact their suitability for various projects.  

In this comparison guide, we will explore the functionality of Kafka and Pulsar, explain the differences between the software, who would use them, and why.  

Click through for that comparison. I haven’t used Pulsar before, so it’s interesting to get this sort of a functionality and community comparison.

Comments closed

Elastic Pools for Azure SQL DB Hyperscale

Arvind Shyamsundar announces a new preview:

We are very excited to announce the preview of elastic pools for Hyperscale service tier for Azure SQL Database!

For many years now, developers have selected the Hyperscale service tier in a “single database” resource model to power a wide variety of traditional and modern applications. Azure SQL Hyperscale is based on a cloud native architecture providing independently scalable compute and storage, and with limits which substantially exceed the resources available in the General Purpose and Business Critical tiers.

Click through to learn more about what’s on offer.

Comments closed

The Current Status of the Lakehouse Architecture

Paul Turley is happy:

When I first started attending conference and user group sessions about Lakehouse architecture, I didn’t get it at first, but I do now; and it checks all the boxes. As a Consulting Services Director in a practice with over 200 BI developers and data warehouse engineers, I see first-hand how our customers – large and small – are adopting the Lakehouse for BI, Data science and operational reporting.

Read on for Paul’s thoughts. My main concern with the strategy has always been performance, with the expectation that it’d take a few years for lakehouse systems to be ready for prime time. We’re getting close to that few years (back in 2020, I believe I estimated 2024-2025).

Comments closed

Data Mesh Q&A

Jean-Georges Perrin hosts another Q&A:

As part of the Data Mesh Learning Community, Eric Broda invited Laveena KewlaniKruthika Potlapally, and me to discuss the implementation of Data Mesh at PayPal. As expected, the session went longer than scheduled, and some questions remained open. As with the previous Q&A sessions ([#1] and [#2]), here is an attempt to answer them.

Click through for the questions, as well as the answers.

Comments closed