Press "Enter" to skip to content

Category: Architecture

Lakehouse, Mesh, and Fabric

James Serra is back in blue:

(NOTE: I have returned to Microsoft and am working as a Solution Architect in Microsoft Industry Solutions, formally known as Microsoft Consulting Services (MCS), where I help customers build solutions on Azure. Contact your Microsoft account executive for more info. That being said: the views and opinions in this blog are mine and not that of Microsoft).

There certainly has been a lot of discussion lately on the topic of Data Lakehouse, Data Mesh, and Data Fabric, and how they compare to the Modern Data Warehouse. There is no clear definition of all these data architectures, and I have created a presentation using my own take that I have been presenting frequently internally at Microsoft and externally to customers and at conferences. Hopefully these presentations, blog posts, and videos can help clarify all these data architectures for you:

Click through for several useful resources to help differentiate these topics.

Comments closed

Data Mesh in Azure: Self-Service Infrastructure

Paul Andrew continues a series on applying data mesh principles in Azure:

This principal is very broad, so I want to break down the theory vs practice as before. The idea of self-service is always a goal in any data platform and the normal thing for analytics is to focus on this within the context of our data consumption. Whereby a semantic layer technology can be used in a friendly business orientated, drag-drop type environment to create dashboards or whatever.

However, my interpretation of ‘self-serve’ for a data mesh architecture goes further than just the dashboard creation use case. This should not just apply at the data consumption layer, but all layers within the solution and for clarify, not just related to the data itself. Hence the term in this principal ‘data infrastructure as a platform’. This then unlocks the deeper implication of this serving for a data product, all abstracts of the platform can be consumed in a self-service manner from a series of predefined assets. Let’s think about this serving more like an internal marketplace or catalogue of assets for delivering everything the data product needs to enable a new node within the wider data mesh.

Read on for some deep thoughts on the topic.

Comments closed

Defining Technical Debt

Paul Andrew has thoughts on technical debt:

A few times now I’ve been asked to define technical debt. It can be an ugly term if your role is a project manager or scrum master. But, for me, as a more technically minded person I see this debt as a very normal thing that in an agile delivery team can be managed. However, before we can manage it, I’d like to use this blog post to define it (and get my own thoughts in order).

Read the whole thing. If you want a bit more on the topic, I have a post sharing my thoughts. Reading Paul’s post, I think there’s a lot of common ground in our ways of thought on this.

Comments closed

Using Proprietary Database Features

Lee Markum does some thinking:

I was recently thinking of SQL Server temporal tables and how there is a perspective that you shouldn’t use proprietary features of a product because it locks you into that product. I want to be your guide on this matter.

I agree completely with Lee’s take on this. I can count on one hand the number of platform migrations I’ve been a part of through the years. These are painful enough experiences even if you use 100% “portable” code because optimization patterns change. And then you get to the utterly absurd: indexes in Oracle and SQL Server behave differently (e.g., clustered indexes are a practical must in SQL Server and the equivalent to clustered indexes in Oracle is so rarely used that it’s not even worth talking about to most Oracle people), so are you going to avoid indexes so you can have “truly” portable code? If the answer to that question is “yes,” you are wrong.

I do understand that there are companies which create code bases which need to be installable on multiple platforms. In that case, I do understand trying to keep things as common as possible, especially for fairly simple apps. But this is also part of why vendor databases are, as a general rule, awful.

Comments closed

The Evolution of Cloud Architecture

Ben Brauer has a two-parter looking at how architecture is changing. Part 1 looks at containers and machine learning:

Let’s start describe containers at a high level.  A container is a packaging and distribution mechanism that abstracts and resolves many of the installer issues that result from ‘unique’ environments.  We’ve all heard developers exclaim “well, it works on my machine,” after pushing an application to a new environment only to realize its broken.  Containers strive to address this problem by creating a hard boundary between the infrastructure and the software stack used by an application. External dependencies are not necessarily added to the container, but all your internal dependencies (frameworks, runtimes, etc.) are there.  This makes the deployment of the application to a new environment significantly more predictable as the compute environment is consistent as its part of the container.

Part two looks at serverless compute and low-code/no-code development:

Low-code (or no-code) development for applications is not a new concept. It strives to democratize development in a similar way as decades ago Visual Basic expanded the number of developers from thousands of C++ developers to hundreds of thousands of developers creating Windows-based solutions. Low-code takes this concept to non-technical professionals. Although this notion is great for productivity and usability, the maintenance and performance of these apps can be daunting to say the least. Now non-technical application authors need to learn about application management, documentation and, application deployment.  Without a clear understanding of these considerations, the environment can quickly become chaotic.  The good news is that platforms and tools have come a long way since Visual Basic. For example, Microsoft’s Power Apps platform provides many of the platform services needed to maintain a healthy application lifecycle and governance paradigm.

These are good concepts to know about, regardless of your particular cloud platform.

Comments closed

Secondary and Tertiary Data Mesh Interfaces in Azure

Paul Andrew continues a series on implementing data mesh with Azure:

When thinking about our node edges in part 2 I also made the statement about a primary set of node interfaces. In my initial drawings I alluded to this then capturing what I’ve called the PaaS Plane, suggesting the Azure Resource type used.

Building on this understanding I want to cover off the remaining edge use cases by exploring the other interface types we will typically need for the nodes of our data mesh architecture.

This has been a rather informative series on a topic I knew very little about coming in.

Comments closed

Building Data Mesh Edges in Azure

Paul Andrew continues a series on data mesh in Azure:

Anyway, moving on. In part 2 of this blog series, keeping the same focus from part 1, with the first data mesh principal. Let’s take our nodes and start thinking about the edges. The data mesh – data product interfaces… Enter my Azure Resource Group with arms/antenna type things, seen on the right 

Caveat: as you may have already gathered, I’m going to use the terms edge and interface a lot in this post. The meaning in the context of the data mesh is the same. Nodes with edges, nodes with interfaces.

Click through for more detail.

Comments closed

Building a Data Mesh in Azure

Paul Andrew starts a new series:

The concepts and principals of a data mesh architecture have been around for a while now and I’ve yet to see anyone else apply/deliver such a solution in Azure. I’m wondering if the concepts are so abstract that it’s hard to translate the principals into real world requirements, and maybe even harder to think about what technology you might actually need to deploy in your Azure environment.

Given this context (and certainly no fear of going first with an idea and being wrong ) here’s what I think we could do to build a data mesh architecture in the Microsoft cloud platform – Azure.

Click through for Paul’s take on the first data mesh principle.

Comments closed

Organizing Synapse Workspaces and Lakehouses

Jovan Popovic confirms that Microsoft is using the term “Lakehouse” like Databricks does:

The lakehouse pattern enables you to keep a large amount of your data in Data Lake and to get the analytic capabilities without a need to move your data to some data warehouse to start an analysis. A lakehouse represents a good trade-off between query performance and the ability to access the latest version of data without the need to wait for data to be reloaded.

Azure Synapse Analytics workspace enables you to implement the Lakehouse pattern on top of Azure Data Lake storage.

When you think about your lakehouse solution, be aware that there are two options for creating databases over the lake:

Lake databases that are created using Spark or database template

SQL databases that are created using serverless SQL pools on top of data lake.

Although you might use different tools and languages to create these types of databases, the principles described in this article apply to both types. I will use the term “lakehouse” whenever i reference Spak Lake database or SQL database created using the serverless SQL pools.

Click through for Jovan’s guidance.

Comments closed

A Case for EAV?

Erik Darling makes the case:

EAV styled tables can be excellent for certain data design patterns, particularly ones with a variable number of entries.

Some examples of when I recommend it are when users are allowed to specify multiple things, like:

I’m not sure I agree on the examples. When there are specific known things with expected shapes, I’d rather have a separate entity to model each. Even if each table is a single string, I’d still like the separation for logical modeling purposes.

That said, there are cases when EAV ends up being the best approach (unfortunately), particularly when you don’t even know the types of things a customer would wish to include. Just try to fight back hard when the inevitable request comes in to pivot all of that data.

Comments closed