Press "Enter" to skip to content

Category: Architecture

The Current Status of the Lakehouse Architecture

Paul Turley is happy:

When I first started attending conference and user group sessions about Lakehouse architecture, I didn’t get it at first, but I do now; and it checks all the boxes. As a Consulting Services Director in a practice with over 200 BI developers and data warehouse engineers, I see first-hand how our customers – large and small – are adopting the Lakehouse for BI, Data science and operational reporting.

Read on for Paul’s thoughts. My main concern with the strategy has always been performance, with the expectation that it’d take a few years for lakehouse systems to be ready for prime time. We’re getting close to that few years (back in 2020, I believe I estimated 2024-2025).

Comments closed

Data Mesh Q&A

Jean-Georges Perrin hosts another Q&A:

As part of the Data Mesh Learning Community, Eric Broda invited Laveena KewlaniKruthika Potlapally, and me to discuss the implementation of Data Mesh at PayPal. As expected, the session went longer than scheduled, and some questions remained open. As with the previous Q&A sessions ([#1] and [#2]), here is an attempt to answer them.

Click through for the questions, as well as the answers.

Comments closed

PayPal’s Data Contract Template Open Sourced

Jean-Georges Perrin makes an announcement:

A data contract is a binding agreement between the consumers and producers of data. You can see it as a data schema on steroids or data schema++. The goal of the contract is to set expectations between the parties. It can be built as fit-for-purpose where the consumers and producer agree on what it should contain or can serve as a brochure for any consumer willing to access the data offered by this (data) product.

Click through to learn more about data contracts and then check out the contract template itself on PayPal’s GitHub repo.

Comments closed

Portfolio Management for Creating a Technology Strategy

Kevin Sookocheff busts out the 2×2 matrix:

Application Portfolio Management (APM) draws inspiration from financial portfolio management, which has been around since at least the 1970s. By looking at all applications and services in the organization and analyzing their costs and benefits, you can determine the most effective way to manage them as part of a larger overall strategy. This allows the architect or engineering leader to take a more strategic approach to managing their application portfolio backed by data. Portfolio management is crucial for creating a holistic view of your team’s technology landscape and making sure that it aligns with business goals.

This is for C-levels and VPs rather than individual contributors, but acts as a good way of thinking about a portfolio of applications and what to do with each.

Comments closed

Checklist for a Snowflake Migration

Sandeep Arora has a checklist for us:

We have broken our Snowflake Migration Checklist into nine phases to help plan and execute an end-to-end migration of the existing traditional data platform to Snowflake. These phases will help align migration resources and efforts; however, this doesn’t necessarily mean that all steps should be executed sequentially. Some phases, like “Train Users,” can be executed parallel to other phases.

At a high level, the process isn’t Snowflake-specific—really, 6 of the 9 steps are generic supporting steps which would apply to any major project. This makes the checklist not only a good starting point for a Snowflake migration, but also any major migration project.

Comments closed

An Overview of the Kappa Architecture

Amian Patnaik provides an overview:

The Kappa Architecture, introduced by Jay Kreps, co-founder of Confluent, is designed to handle real-time data processing in a scalable and efficient manner. Unlike the traditional Lambda Architecture, which separates data processing into batch and stream processing, the Kappa Architecture promotes a single pipeline for both batch and stream processing, eliminating the need for maintaining separate processing pipelines.

What’s interesting to me is that Lambda, an architecture which was an explicit product of its time (in the sense that it was a compromise architecture trying to do two things, the combination of which limited hardware and tooling didn’t allow), is still thriving today. Kappa, meanwhile, isn’t an architectural style that people throw around a lot anymore, at least in the circles I run around in.

Comments closed

Well-Architected Framework Cost Optimization

Brandon Wilson cuts costs:

Hi everyone! Brandon Wilson (Cloud Solution Architect/Engineer) here to follow up on the post I authored previously for the Well-Architected Cost Optimization Assessment offering, with another customer offering we have known as the Well-Architected Cost Optimization Implementation. This offering can be considered as a continuation/”part 2” of sorts for the Well-Architected Cost Optimization Assessment, where the goal is to help you implement some of the findings relating to Azure Reservations, Azure Savings Plans, Azure Hybrid Benefits, along with cleaning up some of that cloud waste sitting around.

Just as before (and in case you are a new reader), we’ll touch a little bit on the Azure Well-Architected Framework (WAF), along with the Cloud Adoption Framework (CAF), and then go over what is covered in the Well-Architected Cost Optimization Implementation offering itself.

Some of this is Microsoft-internal tooling, though the WAF assessments themselves are available to the general public and well worth going through.

Comments closed

Landing Zone Layouts for Modern Data Warehouses

Paul Hernandez builds out a landing zone for a warehouse:

In this article I want to discuss some different layout options for a landing zone in a modern cloud data warehouse architecture. With landing zone, I mean a storage account where raw data lands directly from its source system (not to be confused with a landing zone to move a system or application into the cloud).

One of the things I appreciate a lot about this post is that it covers the history, showing us how we got to where we are. Paul’s well-versed in each step along the way and lays things out clearly.

Comments closed

Data Pipelines and Data Mesh

Jean-Georges Perrin answers a burning question:

I keep having questions about data pipelines. Data pipelines in Data Mesh is a topic I should tackle. So… Is the data pipeline the root of all evil?

Jean-Georges’s answer is quite in line with one of my favorite phrases: “Short answer: no, with an ‘if’; long answer: yes, with a ‘but.'” Read on for some thoughts on data pipelines and what the data mesh concept does to minimize harm.

Comments closed

Building a Dimension and Measure Matrix for Power BI

Olivier Van Steenlandt does some documentation:

In this blog post, I will guide you through all the required steps to get a Data Model Relationship Matrix in Power BI.

If you don’t know what I mean, I would like to have a straightforward overview where I can see which attribute groups and measure groups I can combine from my Tabular Model in (SQL Server) Analysis Server.

The first thing I thought of was “this is very much like a bus matrix in the Kimball model.” It’s a little different, though, as the rows in the axis pertain to measure groups rather than business units.

Comments closed