Press "Enter" to skip to content

Category: Architecture

Multi-Developer Power BI Development

Reza Rad architects a solution for multiple developers working on a Power BI project:

Before I start explaining the architecture, it is important to understand the challenge and think about how to solve it. The default usage of Power BI involves getting data imported into the Power BI data model and then visualizing it. Although there are other modes and other connection types, however, the import data is the most popular option. However, there are some challenges in a model and a PBIX file with everything in one file. Here are some;

– Multiple developers cannot work on one PBIX file at the same time. Multi-Developer issue.

– Integrating the single PBIX file with another application or dataset would be very hard. High Maintenance issue.

– All data transformations are happening inside the model, and the refresh time would be slower.

– The only way to expand visualization would be by adding pages to the model, and you will end up with hundreds of pages after some time.

– Every change, even a small change in the visualization, means deploying the entire model.

– Creating a separate Power BI file with some parts it referencing from this model would not be possible; as a result, you would need to make a lot of duplicates and high maintenance issues again.

– If you want to re-use some of the tables and calculations of this file in other files in the future, it won’t be easy to maintain when everything is in one file.

– And many other issues.

After laying out all of the challenges, Reza puts together a plan to resolve them.

Comments closed

Data Mesh at Netflix

Bo Lei, et al, describe their Data Mesh architecture:

Realtime processing technologies (A.K.A stream processing) is one of the key factors that enable Netflix to maintain its leading position in the competition of entertaining our users. Our previous generation of streaming pipeline solution Keystone has a proven track record of serving multiple of our key business needs. However, as we expand our offerings and try out new ideas, there’s a growing need to unlock other emerging use cases that were not yet covered by Keystone. After evaluating the options, the team has decided to create Data Mesh as our next generation data pipeline solution.

Click through for a high-level overview of the architecture.

Comments closed

Organizing Data Domains in a Data Mesh

Paul Andrew continues a series on data mesh architecture:

Defining an organisation hierarchy is always hard, even more so for large enterprises with massive amounts of interlock between business functions. In the context of data analytics, we attempt to tackle the problem by creating an organisation dimension as part of our star schema data model. This could include things like region, operating company, branch, department, team etc.

So, my friends, how do we go about handling this when considering a data mesh architecture and the de-centralised domains that support the natural scalability we crave. For me, it feels like we are just frontloading the dimensional modelling problem. Tackling it from the beginning in the very foundations of our data platform. But, with a twist.

Read on for that twist and for some solid guidance on data domains in practice compared to the theory.

Comments closed

Data Architecture Questions to Ask

James Serra does some thinking:

As an example, if I were asked what product to use to store data in the Azure cloud, I could come up with at least a dozen options, so I need to ask questions to reduce the choices to the best use case for the customers situation. This will avoid what I have seen many times – a company chooses a particular product and after their solution is built, they say the product is “terrible”, but they were using it for a use case that it was not designed for. But the customer was not aware of a better product for their use case because “they don’t know what they don’t know”. That is why you should work with an architect expert as one of your first order of business: the technology decisions at this early part of building a solution are vital to get correct, as finding out 6-months or one year later that you made the wrong choice and have to start over can lead to so much wasted time and money (and I have seen some shocking waste).

Read on for a slew of questions.

Comments closed

Starting a Data Mesh Project

Paul Andrew continues a series on data mesh:

A common question I get asked a lot when creating a data mesh architecture is where to start? The consultant in me defaults the answer to ‘it depends’, of course 

However, in this blog post I want to give a better answer based on my experience of working with various customers so far. As always, the usual caveats apply, I’m happy to go first when trying to define a starting point for our data mesh delivery and fully accept that parts of this are probably wrong. This is also founded in the knowledge that every customer I’ve worked with is different, with different priorities and very subjective views on why they even need a data mesh architecture. Not to mention various levels of data platform maturity.

Paul also includes some nice roadmap and architectural box-drawing diagrams, so check those out.

Comments closed

Azure SQL Database and the Well-Architected Framework

Jason Bouska has a big announcement:

Microsoft Azure SQL Database is a fully managed cloud database (PaaS) that handles many database management tasks without user intervention. Tasks such as patching, upgrading, taking backups, and monitoring can be configured to the specific needs of the workload and are performed in the background. Azure SQL Database runs the latest stable version of SQL Server and patched OS with 99.99% availability. The intelligent automated functions built into the database free up the user to focus on other important tasks.

Today I am introducing the Azure Well-Architected Service Guide for Azure SQL Database. Like other service guides, this guide for Azure SQL Database contains design considerations, checklists, and detailed configuration recommendations that can assist cloud architects in deploying optimal Azure SQL workloads in line with the guiding tenets of the Well-Architected Framework: security, reliability, cost management, performance efficiency, and operational excellence.

I’ve found that the Well-Architected Framework (whose overloaded acronym is still annoying) works best once you’re far enough along that you have a good idea of workload characteristics, meaning it’s not for the pre-planning state. Also, a full review might take hours or days and require several people to complete, not just a DBA.

Comments closed

The Basics of Snowflake Architecture

Arun Sirpal lays out the foundation of Snowflake DB’s architecture:

At the most basic level, Snowflake has 3 important components. The Cloud services layer, centralised storage layer and the compute layer.

Cloud services – they call this the “brains” of snowflake. This is where infrastructure management takes place, the optimiser is based (cost-based), metadata management and security (authentication and access control) are handled.

Read on to learn about the other two layers and how they meet.

Comments closed

The Importance of a Proper Datamart / Data Warehouse

Teo Lachev explains why you want a datamart (or a data warehouse) for BI solutions:

I sent a proposal for implementing a classic BI solution: Azure SQL-based datamart (not Power BI datamart please), ETL, semantic model, and reports. The client had a sticker shock. Return to sender … as other BI companies that quoted can do it for half! Upon digging, it turned out the other companies would build the semantic model (aka Power BI dataset) directly on top of the data source. On a T&M basis, of course, what else? By contrast, I give fixed-price milestone-driven proposals and I don’t get paid unless I deliver and meet written and agreed upon success criteria, but that’s a different story.

So, let me count the ways as the poet would say. It’s certainly technically possible to slap a dataset on top of the data source(s). That’s what self-service BI is all about right … until it doesn’t serve anymore

Read on for more detail.

Comments closed

Proofs of Concept and Pilots

Kenneth Fisher strikes a chord:

If your POC does not follow your companies best practices and standards then it is not a valid POC.

There are way to many settings that will change it’s performance, cause security issues, etc. On top of that, almost every POC I’ve ever seen ends up becoming the test environment if not the actual production environment. So all of those little compromises end up in your actual, non POC environment because it’s way too much work to fix them now. You should have said something when we set this up.

To use one of my favorite lines, “Short answer: yes with an if; long answer: no with a but.”

Before I get to the answer, I do want to differentiate between a proof of concept and a pilot. The idea of a proof of concept is to see if I can make this thing work. Can I get these two processes to talk to each other? Can I build a website which accepts user input and displays something? Can I get this idea from my head into code? Can I process 500,000 records per second using our existing hardware? One important thing about a proof of concept is that it always has the possibility of failure. “No” is a valid answer here based on the conditions. By contrast, a pilot is a starter for the full project. You might work with one business unit instead of all of them, migrate a small amount of traffic to the new system, or only handle data from a single branch office. Also, you want that answer as fast as is reasonable so that your business decision-makers can make business decisions on that information. By contrast, when we do a pilot, we already know the answer is yes; we just need to build it out and answer the technical details along the way.

Returning to the line above: Yes, I agree with Kenneth if your company lacks the discipline to differentiate between proofs of concept and pilots (and that’s not as denigrating a remark as it sounds…though it’s somewhat denigrating). No, do not follow the same practices for a proof of concept that you would for a full product, but you need to ensure that code gets destroyed and you start over with new code which does follow those practices.

2 Comments

Have One Data Model per Business Area

James McGillivray offers us an important piece of advice:

I cannot stress this enough. If people are consuming your data in multiple places, the data needs to come from the same data model. That can be an Enterprise Data Warehouse, a Data Mart, a Power BI Model, or any other data source, but at some point you need to be able to track the data back to a single place. If you don’t do this, you will spend THE REST OF YOUR DAYS explaining the differences between the data models to business and customers, and reconciling the differences over and over again.

Read on to learn why this is so important.

Comments closed