Press "Enter" to skip to content

Category: Architecture

An Overview of the Kappa Architecture

Amian Patnaik provides an overview:

The Kappa Architecture, introduced by Jay Kreps, co-founder of Confluent, is designed to handle real-time data processing in a scalable and efficient manner. Unlike the traditional Lambda Architecture, which separates data processing into batch and stream processing, the Kappa Architecture promotes a single pipeline for both batch and stream processing, eliminating the need for maintaining separate processing pipelines.

What’s interesting to me is that Lambda, an architecture which was an explicit product of its time (in the sense that it was a compromise architecture trying to do two things, the combination of which limited hardware and tooling didn’t allow), is still thriving today. Kappa, meanwhile, isn’t an architectural style that people throw around a lot anymore, at least in the circles I run around in.

Comments closed

Well-Architected Framework Cost Optimization

Brandon Wilson cuts costs:

Hi everyone! Brandon Wilson (Cloud Solution Architect/Engineer) here to follow up on the post I authored previously for the Well-Architected Cost Optimization Assessment offering, with another customer offering we have known as the Well-Architected Cost Optimization Implementation. This offering can be considered as a continuation/”part 2” of sorts for the Well-Architected Cost Optimization Assessment, where the goal is to help you implement some of the findings relating to Azure Reservations, Azure Savings Plans, Azure Hybrid Benefits, along with cleaning up some of that cloud waste sitting around.

Just as before (and in case you are a new reader), we’ll touch a little bit on the Azure Well-Architected Framework (WAF), along with the Cloud Adoption Framework (CAF), and then go over what is covered in the Well-Architected Cost Optimization Implementation offering itself.

Some of this is Microsoft-internal tooling, though the WAF assessments themselves are available to the general public and well worth going through.

Comments closed

Landing Zone Layouts for Modern Data Warehouses

Paul Hernandez builds out a landing zone for a warehouse:

In this article I want to discuss some different layout options for a landing zone in a modern cloud data warehouse architecture. With landing zone, I mean a storage account where raw data lands directly from its source system (not to be confused with a landing zone to move a system or application into the cloud).

One of the things I appreciate a lot about this post is that it covers the history, showing us how we got to where we are. Paul’s well-versed in each step along the way and lays things out clearly.

Comments closed

Data Pipelines and Data Mesh

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Building a Dimension and Measure Matrix for Power BI

Olivier Van Steenlandt does some documentation:

In this blog post, I will guide you through all the required steps to get a Data Model Relationship Matrix in Power BI.

If you don’t know what I mean, I would like to have a straightforward overview where I can see which attribute groups and measure groups I can combine from my Tabular Model in (SQL Server) Analysis Server.

The first thing I thought of was “this is very much like a bus matrix in the Kimball model.” It’s a little different, though, as the rows in the axis pertain to measure groups rather than business units.

Comments closed

Architectural Erosion and Technical Debt

Uli Homann and Eric Charran (via Ben Brauer) talk about the concept of architectural erosion:

The way Eric thinks about architectural erosion is when architects and engineers work together, they construct a system or solution. They have launched that solution into production. It’s performing well for a period of time and then change happens. Whether it’s a change in requirements, a change in infrastructure, or a change in customer habits, DevOps signals saying that people are using a certain feature versus they’re not using a certain feature. What ended up happening is there’s a moment in time in which we look at velocity around how do we implement this change and make the applications experience, do what the customer or the end user wants as quickly as possible. Then there’s the strategic picture of managing things like technical debt, which is if I do something tactical, I’m probably going to do it fast and cheap and not necessarily the “right way.” This then accrues to the architectural patterns, longevity and scalability and all those other types of things, and then that goes into my pile of technical debt.

Read on to learn more about the topic and what we, as technical professionals, can do to mitigate this risk.

Comments closed

Data Mesh Q&A Round 2

Jean-Georges Perrin didn’t hear no bell:

How does the Data Mesh concept differ from similar efforts in the past, like EDM (Enterprise Data Model) or MDM (Master Data Model)?
Data Mesh will help us achieve those goals more quickly as those EDM and MDM projects are usually slow, and the ROI starts showing only after deployment. The product approach of Data Mesh for its data products enables a product lifecycle mentality that will help get from a current state to an (end?) state like EDM through versioning. It also allows EDM to be versioned more efficiently and reduces time to market.

Read on for a series of questions and answers around the topic of data mesh architecture.

Comments closed

Data Mesh Q&A

Jean-Georges Perrin answers some questions:

How about data virtualization? If you have different Data Hubs with different data models, how do you integrate them?

As illustrated in the next figure, you can use data virtualization pointing to various physical data stores. Your onboarding pipeline can be “virtual” or at least leveraging virtualized data stores. You will gain in data freshness by reducing latency but you may be limited in the number of data transformations you want to perform towards your interoperable model.

Read on for the full set of questions and answers.

Comments closed

Sun Modeling and SunBeam

Shannon Bloye takes us through a new analytics systems modeling technique:

 Sun Modelling was a technique initially developed and taught by Mark Whitehorn as a professor of analytics at the University of Dundee. Which is where our own Terry McCann encountered the approach whilst studying for his MSc. He does a great talk on the topic in this video.

A core aim of the method is to offer a simplicity that makes it accessible to end users as well as the usual technical professionals. The approach is a high-level visual means to model data around a business process.

This feels a bit like a Kimball model but where you’re explicitly diagramming hierarchies and common slicers.

Comments closed

Combining On-Demand and Spot VMs in AKS

Prakash P covers a topic near and dear to my heart—saving money by using spot instances:

While it’s possible to run the Kubernetes nodes either in on-demand or spot node pools separately, we can optimize the application cost without compromising the reliability by placing the pods unevenly on spot and OnDemand VMs using the topology spread constraints. With baseline amount of pods deployed in OnDemand node pool offering reliability, we can scale on spot node pool based on the load at a lower cost.

I like this idea a lot, as spot instances trade off saving a lot of money (up to 90%) for unreliability: you lose the spot instance as soon as someone else comes in willing to pay more. This gives you the best of both worlds with AKS: emphasize spot instances for the money savings but include the ability to use on-demand pricing for VMs when spot isn’t available. If I’m understanding the post correctly, this also reduces the downside risk of service instability that you get when spot instances are bought out from under you, as Kubernetes will automatically spin up and down services within a pod to keep a consistent number of instances available across the nodes to users.

Comments closed