Press "Enter" to skip to content

Diving into Data Mesh

Ust Oldfield has a series on data mesh. First up is a primer on the topic:

What is a Data Mesh? Data Mesh is new and was introduced in a few articles by Zhamak Dehghani, starting in May 2019. The first article, How To Move Beyond a Monolithic Data Lake to a Distributed and Data Mesh, and the second article, Data Mesh Principles and Logical Architecture, form the foundational thought pieces on data meshes. Very simply, and crudely, a Data Mesh aims to overcome the deficiencies of previous generations of analytical architectures by decentralising the ownership and production of analytical data to the teams who own the data domain. It is a convergence of Distributed Domain Driven Architecture, Self-serve Platform Design, and Product Thinking with Data.

Essentially applying the learnings from operational systems, of applying domain driven design, so that ownership of data is domain oriented. For example, a CRM team will own the Customer domain and all the data within it. Instead of data from all domains flowing into a central data repository, the individual domains curate and serve their datasets in an easily consumable way. A much closer alignment between operational and analytical data.

Then Ust has a deep dive:

A foundational principle of data mesh is the decentralisation of ownership of data to those closest to it. Those closest to it are those in the business, often using the operational systems and also using analytical data. Responsibility and ownership of the data is devolved from a central function to business units and domains. Therefore any change to how a business domain organises itself is limited to the domain itself rather than impacting the entire organisation. This is referred to as the bounded domain context.

Teams responsible for the business operation, e.g. CRM team responsible for Customers, are also responsible for the ownership and serving of the analytical data relating to their domain.

I probably need to spend more time thinking about it, but this feels like a rickety ladder of incentive compatibility problems. I agree with Ust’s characterization of warehouses as staid and that relatively little has happened since the Kimball model came to prominence, but this sounds like taking the problems of the Kimball model (slow to change, an iceberg of code & ETL, etc.) and effectively saying “Hey, here are slightly different views of the source systems, have fun” instead. That way, everybody gets to experience the joy of non-conforming dimensions, disagreeing facts, working with data in different grains, and seven terms for the same thing.