Architecture – Page 21

Data Mesh and Ownership Strategies

Published 2021-07-27 by Kevin Feasel

James Serra aims to clear up some confusion:

I have done a ton of research lately on Data Mesh (see the excellent Building a successful Data Mesh – More than just a technology initiative for more details), and have some concerns about the paradigm shift it requires. My last blog tackled the one about Centralized vs decentralized data architecture. In this one I want to talk about centralized ownership vs decentralized ownership, along with another paradigm shift (or core principle) closely related to it, siloed data engineering teams vs cross-functional data domain teams.
First I wanted to mention there is a Data Mesh Learning slack channel that I have spent a lot of time reading and what is apparent is there is a lot of confusion on exactly what a data mesh is and how to build it. I see this as a major problem as the more difficult it is to explain a concept the more difficult it will be for companies to successfully build that concept, so the promise of a data mesh improving the failure rates for big data projects will be difficult to achieve if we can’t all agree exactly what a data mesh is. What’s more is the core principles of the data mesh sound great in theory but will have challenges in implementing them, hence my thoughts in this blog on centralized ownership vs decentralized ownership.

Read on for James’s take on the matter.

Comments closed

Centralized and Decentralized Data Architectures

Published 2021-07-02 by Kevin Feasel

James Serra looks at a pattern:

A centralized data architecture means the data from each domain/subject (i.e. payroll, operations, finance) is copied to one location (i.e. a data lake under one storage account), and that the data from the multiple domains/subjects are combined to create centralized data models and unified views. It also means centralized ownership of the data (usually IT). This is the approach used by a Data Fabric.
A decentralized distributed data architecture means the data from each domain is not copied but rather kept within the domain (each domain/subject has its own data lake under one storage account) and each domain has its own data models. It also means distributed ownership of the data, with each domain having its own owner.
So is decentralized better than centralized?

Read on for James’s answer, and allow me to include a Dilbert cartoon so old, the boss didn’t even have pointy hair yet.

How Decentralized Organizations Can be Effective | The Fourth Revolution Blog

Comments closed

Centralized Data Modeling via Power BI Templates

Published 2021-05-12 by Kevin Feasel

Haroon Ashraf aims to square the circle:

Data modeling is the way you can arrange and link your organizational data (typically in the form of tables) for reporting and analysis.
In other words, it is the strategy of lining tables with each other to get useful information by following the standard practices and domain knowledge of the organization.
Traditionally, it stands for implementing the star or snowflake schema from the perspective of the data warehouse BI solution.
What is Centralized Data Modeling?
Centralized data modeling means a generic data model consisting of some commonly used tables, relationships, and hierarchies that are shared across the organization. These elements the starting point for Power BI report development to anyone eligible, interested, and capable to do so.

With that in mind, read on to learn how you can use Power BI templates to bring this about. I joke about squaring the circle here because if you treat Power BI as a self-service business intelligence tool, the users may not be totally familiar with what you’re doing and could end up accidentally undermining your plans. That said, it’s a good approach to solving this common problem.

Comments closed

A Logical Architecture for Azure Synapse Analytics

Published 2021-05-06 by Kevin Feasel

Paul Andrew lays out some thoughts:

As a community let’s all start thinking about this given the rich unified environment that Synapse Analytics offers. Below is a picture I’ve drawn as a starting point for a complete Synapse logical architecture. What do you think?

Click through for the image, which looks pretty solid to me.

Comments closed

Superkeys

Published 2021-04-14 by Kevin Feasel

Kevin Wilkie knows that not all keys wear capes:

Well, that’s because we sometimes need different ways to describe what we’ve got going on. Of the four different types of keys we’ve discussed so far, they are all different enough that we need to differentiate them and be able to explain what the differences are. For the rest of the keys that we’ll go through today, the same idea exists. They are close to the others but different enough that there is a need for another name for that type of key.
There is a super key.

Read on to learn what a superkey is. That will put you one quarter of the way to understanding Boyce-Codd Normal Form: a relational variable is in Boyce-Codd Normal Form if and only if all functional dependencies have superkey determinants.

Comments closed

Delivering Data Insights using the Microsoft Data Platform

Published 2021-03-05 by Kevin Feasel

Paul Andrew has a talk:

Let’s start with a story, not a ‘once upon a time story‘, a story for your backlog
As a solution architect
I need to design and build an Azure data analytics platform end to end
to deliver data insights for my customer.
In February 2021 I delivered a talk as part of the Scottish Summit conference on how you could/should build an end to end data platform solution in Azure to deliver data insights and analytics. This is one of my favourite sessions so thought it worth re-sharing the recording here.

Click through for the abstract as well as the video.

Comments closed

Soft Deletes in SQL Server

Published 2021-01-13 by Kevin Feasel

Erik Darling has some thoughts on soft deletes:

Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Read the whole thing. Incidentally, this also ties well into a recent post by Erik about deleting into a different table. It can be easier to implement soft deletes as deleting from the current table and adding to an archive table. That gives you the benefits of keeping deleted data while not running into some of the problems Erik mentions. And if you want to undo a deletion? Delete from the archive table and insert back into the main table.

Comments closed

When to Use Event Sourcing

Published 2021-01-07 by Kevin Feasel

Vikas Hazrati takes us through the pros and cons of using event sourcing for a project:

You would always get a ton of literature on Event Sourcing and CQRS. The key question is WHEN do you use it? Under what circumstances? Is your problem really in need of ES?
I would not go into the details of what Event Sourcing and CQRS is. The industry stalwarts have covered that in adequate detail. This post delves into battle-tested scenarios on where we should have used and otherwise ignored ES.

Click through for an analysis of pros and cons, as well as some advice on what it all means.

Comments closed

Power BI End-to-End Diagram Update

Published 2021-01-05 by Kevin Feasel

Melissa Coates has an update for us:

An updated version of the Power BI End-to-End diagram is now available as of January 4, 2021.

This update is as of January 4th and includes things like Azure Purview. Melissa also shows how it has evolved over time, which is particularly interesting for Power BI given its rate of change.

Comments closed

Moving Away from the Lambda Architecture

Published 2020-12-11 by Kevin Feasel

Xiang Zhang and Jingyu Zhu talk about migrating a project away from the Lambda architecture:

The Lambda architecture has become a popular architectural style that promises both speed and accuracy in data processing by using a hybrid approach of both batch processing and stream processing methods. But it also has some drawbacks, such as complexity and additional development/operational overheads. One of our features for Premium members on LinkedIn, Who Viewed Your Profile (WVYP), relied on a Lambda architecture for some time. The backend system supporting this feature had gone through a few architectural iterations in the past years: it started as a Kafka client processing a single Kafka topic, and eventually evolved to a Lambda architecture with more complicated processing logic. However, in an effort to pursue faster product iteration and lower operational overheads, we recently underwent a transition to make it Lambda-less. In this blog post, we’ll share some of the lessons learned in operating this system in the Lambda architecture, the decisions made in transitioning to Lambda-less, and the shifts necessary to undergo this transition.

When Lambda was first proposed back in 2015, it was intended as a compromise architecture trying to solve several important problems with the tools available in 2015 (well, 2013 and 2014—it was in a book, after all). I could definitely see the architecture fall into disuse within the next decade, not because it was at all bad, but because the world around it changed to the point that there is a better compromise available.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Category: Architecture