Press "Enter" to skip to content

Category: Architecture

Data Pipelines and Data Mesh

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Building a Dimension and Measure Matrix for Power BI

Olivier Van Steenlandt does some documentation:

In this blog post, I will guide you through all the required steps to get a Data Model Relationship Matrix in Power BI.

If you don’t know what I mean, I would like to have a straightforward overview where I can see which attribute groups and measure groups I can combine from my Tabular Model in (SQL Server) Analysis Server.

The first thing I thought of was “this is very much like a bus matrix in the Kimball model.” It’s a little different, though, as the rows in the axis pertain to measure groups rather than business units.

Comments closed

Architectural Erosion and Technical Debt

Uli Homann and Eric Charran (via Ben Brauer) talk about the concept of architectural erosion:

The way Eric thinks about architectural erosion is when architects and engineers work together, they construct a system or solution. They have launched that solution into production. It’s performing well for a period of time and then change happens. Whether it’s a change in requirements, a change in infrastructure, or a change in customer habits, DevOps signals saying that people are using a certain feature versus they’re not using a certain feature. What ended up happening is there’s a moment in time in which we look at velocity around how do we implement this change and make the applications experience, do what the customer or the end user wants as quickly as possible. Then there’s the strategic picture of managing things like technical debt, which is if I do something tactical, I’m probably going to do it fast and cheap and not necessarily the “right way.” This then accrues to the architectural patterns, longevity and scalability and all those other types of things, and then that goes into my pile of technical debt.

Read on to learn more about the topic and what we, as technical professionals, can do to mitigate this risk.

Comments closed

Data Mesh Q&A Round 2

Jean-Georges Perrin didn’t hear no bell:

How does the Data Mesh concept differ from similar efforts in the past, like EDM (Enterprise Data Model) or MDM (Master Data Model)?
Data Mesh will help us achieve those goals more quickly as those EDM and MDM projects are usually slow, and the ROI starts showing only after deployment. The product approach of Data Mesh for its data products enables a product lifecycle mentality that will help get from a current state to an (end?) state like EDM through versioning. It also allows EDM to be versioned more efficiently and reduces time to market.

Read on for a series of questions and answers around the topic of data mesh architecture.

Comments closed

Data Mesh Q&A

Jean-Georges Perrin answers some questions:

How about data virtualization? If you have different Data Hubs with different data models, how do you integrate them?

As illustrated in the next figure, you can use data virtualization pointing to various physical data stores. Your onboarding pipeline can be “virtual” or at least leveraging virtualized data stores. You will gain in data freshness by reducing latency but you may be limited in the number of data transformations you want to perform towards your interoperable model.

Read on for the full set of questions and answers.

Comments closed

Sun Modeling and SunBeam

Shannon Bloye takes us through a new analytics systems modeling technique:

 Sun Modelling was a technique initially developed and taught by Mark Whitehorn as a professor of analytics at the University of Dundee. Which is where our own Terry McCann encountered the approach whilst studying for his MSc. He does a great talk on the topic in this video.

A core aim of the method is to offer a simplicity that makes it accessible to end users as well as the usual technical professionals. The approach is a high-level visual means to model data around a business process.

This feels a bit like a Kimball model but where you’re explicitly diagramming hierarchies and common slicers.

Comments closed

Combining On-Demand and Spot VMs in AKS

Prakash P covers a topic near and dear to my heart—saving money by using spot instances:

While it’s possible to run the Kubernetes nodes either in on-demand or spot node pools separately, we can optimize the application cost without compromising the reliability by placing the pods unevenly on spot and OnDemand VMs using the topology spread constraints. With baseline amount of pods deployed in OnDemand node pool offering reliability, we can scale on spot node pool based on the load at a lower cost.

I like this idea a lot, as spot instances trade off saving a lot of money (up to 90%) for unreliability: you lose the spot instance as soon as someone else comes in willing to pay more. This gives you the best of both worlds with AKS: emphasize spot instances for the money savings but include the ability to use on-demand pricing for VMs when spot isn’t available. If I’m understanding the post correctly, this also reduces the downside risk of service instability that you get when spot instances are bought out from under you, as Kubernetes will automatically spin up and down services within a pod to keep a consistent number of instances available across the nodes to users.

Comments closed

Kafka Control and Data Planes

Sanjay Garde explains how the architecture of Apache Kafka solutions has expanded over time:

With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.

Read on to see how the concept works at an architectural level.

Comments closed

Recommendations for Dedicated SQL Pool Data Modeling

Bhaskar Sharma has some advice:

In this article, I will discuss how to physically model an Azure Synapse Analytics data warehouse while migrating from an existing on-premises MPP (Massive Parallel Processing) data warehouse solution like Teradata and Netezza. The approach and methodologies discussed in this article are purely based on the knowledge and insight I have gained while migrating these data warehouses to Azure Synapse dedicated SQL pool. 

Dedicated SQL pools are close enough to regular SQL Server that we make a lot of assumptions about it, some of which may be wrong.

Comments closed

The Power of Metadata-Driven Development

Koen Verbeeck lays out a recommendation:

In this blog post I’ll talk about another of those rules/mantras/patterns/maxims:

build once, add metadata

I’m not sure if I’m using the right words, I heard something similar in a session by Spark enthusiast Simon Whiteley. He said you should only write code once, but make it flexible and parameterized, so you can add functionality just by adding metadata somewhere. A good example of this pattern can be found in Azure Data Factory; by using parameterized datasets, you can build one flexible pipeline that can copy for example any flat file, doesn’t matter which columns it has. I have blogged about this:

Click through to learn more about the concept, as well as some tips on how you’d do that in various data movement products (e.g., SSIS, ADF, Logic Apps).

Comments closed