Press "Enter" to skip to content

Category: Architecture

Optimizing Azure Pricing for Storage and VMs

Shane Baldacchino continues a series on cost optimization in the cloud:

Cost. I have been fortunate to work for and help migrate one of Australia’s leading websites (seek.com.au) in to the cloud and have worked for both large public cloud vendors. I have seen the really good, and the not so good when it comes to architecture.

Cloud and cost. It can be quite a polarising topic. Do it right, and you can run super lean, drive down the cost to serve and ride the cloud innovation train. But inversely do it wrong, treat public cloud like a datacentre then your costs could be significantly larger than on-premises.

Click through for some good advice, including an appreciation of spot instances.

Comments closed

Building a Data Serving API in Azure

Justice Zishanhi has some recommendations for serving data in Azure:

Data is an important asset to all organizations big and small. As these organizations mature, building an end-to-end data platform to enable BI and AI at scale has become part of that journey. Some organizations, have the requirement to expose modelled data in a data warehouse or data lake (Azure Data Lake Storage Gen2) to downstream consumer applications (mobile or web apps) where access patterns can be unpredictable in respect to frequency of access and/or type of data that is requested.

Data warehouse engines and data lakes are not designed for singleton transactional (request / response) interactions.  To serve these requests at scale and to meet the different SLAs and access pattern unpredictability, data needs to be offloaded to a suitable database engine (i.e., a caching layer) that is built to serve such queries.  

The “Design Patterns” section of this article highlights a generalized pattern for implementing a data serving API which meets this requirement – consisting of a Data Platform component and an API component. For implementing the API, two patterns are commonly adopted – a synchronous pattern or an asynchronous pattern. Both are explored in the “API Implementation Patterns” section of this article.

The example focuses on Cosmos DB and provides quite a bit of helpful guidance.

Comments closed

Multi-Developer Power BI Development

Reza Rad architects a solution for multiple developers working on a Power BI project:

Before I start explaining the architecture, it is important to understand the challenge and think about how to solve it. The default usage of Power BI involves getting data imported into the Power BI data model and then visualizing it. Although there are other modes and other connection types, however, the import data is the most popular option. However, there are some challenges in a model and a PBIX file with everything in one file. Here are some;

– Multiple developers cannot work on one PBIX file at the same time. Multi-Developer issue.

– Integrating the single PBIX file with another application or dataset would be very hard. High Maintenance issue.

– All data transformations are happening inside the model, and the refresh time would be slower.

– The only way to expand visualization would be by adding pages to the model, and you will end up with hundreds of pages after some time.

– Every change, even a small change in the visualization, means deploying the entire model.

– Creating a separate Power BI file with some parts it referencing from this model would not be possible; as a result, you would need to make a lot of duplicates and high maintenance issues again.

– If you want to re-use some of the tables and calculations of this file in other files in the future, it won’t be easy to maintain when everything is in one file.

– And many other issues.

After laying out all of the challenges, Reza puts together a plan to resolve them.

Comments closed

Data Mesh at Netflix

Bo Lei, et al, describe their Data Mesh architecture:

Realtime processing technologies (A.K.A stream processing) is one of the key factors that enable Netflix to maintain its leading position in the competition of entertaining our users. Our previous generation of streaming pipeline solution Keystone has a proven track record of serving multiple of our key business needs. However, as we expand our offerings and try out new ideas, there’s a growing need to unlock other emerging use cases that were not yet covered by Keystone. After evaluating the options, the team has decided to create Data Mesh as our next generation data pipeline solution.

Click through for a high-level overview of the architecture.

Comments closed

Organizing Data Domains in a Data Mesh

Paul Andrew continues a series on data mesh architecture:

Defining an organisation hierarchy is always hard, even more so for large enterprises with massive amounts of interlock between business functions. In the context of data analytics, we attempt to tackle the problem by creating an organisation dimension as part of our star schema data model. This could include things like region, operating company, branch, department, team etc.

So, my friends, how do we go about handling this when considering a data mesh architecture and the de-centralised domains that support the natural scalability we crave. For me, it feels like we are just frontloading the dimensional modelling problem. Tackling it from the beginning in the very foundations of our data platform. But, with a twist.

Read on for that twist and for some solid guidance on data domains in practice compared to the theory.

Comments closed

Data Architecture Questions to Ask

James Serra does some thinking:

As an example, if I were asked what product to use to store data in the Azure cloud, I could come up with at least a dozen options, so I need to ask questions to reduce the choices to the best use case for the customers situation. This will avoid what I have seen many times – a company chooses a particular product and after their solution is built, they say the product is “terrible”, but they were using it for a use case that it was not designed for. But the customer was not aware of a better product for their use case because “they don’t know what they don’t know”. That is why you should work with an architect expert as one of your first order of business: the technology decisions at this early part of building a solution are vital to get correct, as finding out 6-months or one year later that you made the wrong choice and have to start over can lead to so much wasted time and money (and I have seen some shocking waste).

Read on for a slew of questions.

Comments closed

Starting a Data Mesh Project

Paul Andrew continues a series on data mesh:

A common question I get asked a lot when creating a data mesh architecture is where to start? The consultant in me defaults the answer to ‘it depends’, of course 

However, in this blog post I want to give a better answer based on my experience of working with various customers so far. As always, the usual caveats apply, I’m happy to go first when trying to define a starting point for our data mesh delivery and fully accept that parts of this are probably wrong. This is also founded in the knowledge that every customer I’ve worked with is different, with different priorities and very subjective views on why they even need a data mesh architecture. Not to mention various levels of data platform maturity.

Paul also includes some nice roadmap and architectural box-drawing diagrams, so check those out.

Comments closed

Azure SQL Database and the Well-Architected Framework

Jason Bouska has a big announcement:

Microsoft Azure SQL Database is a fully managed cloud database (PaaS) that handles many database management tasks without user intervention. Tasks such as patching, upgrading, taking backups, and monitoring can be configured to the specific needs of the workload and are performed in the background. Azure SQL Database runs the latest stable version of SQL Server and patched OS with 99.99% availability. The intelligent automated functions built into the database free up the user to focus on other important tasks.

Today I am introducing the Azure Well-Architected Service Guide for Azure SQL Database. Like other service guides, this guide for Azure SQL Database contains design considerations, checklists, and detailed configuration recommendations that can assist cloud architects in deploying optimal Azure SQL workloads in line with the guiding tenets of the Well-Architected Framework: security, reliability, cost management, performance efficiency, and operational excellence.

I’ve found that the Well-Architected Framework (whose overloaded acronym is still annoying) works best once you’re far enough along that you have a good idea of workload characteristics, meaning it’s not for the pre-planning state. Also, a full review might take hours or days and require several people to complete, not just a DBA.

Comments closed

The Basics of Snowflake Architecture

Arun Sirpal lays out the foundation of Snowflake DB’s architecture:

At the most basic level, Snowflake has 3 important components. The Cloud services layer, centralised storage layer and the compute layer.

Cloud services – they call this the “brains” of snowflake. This is where infrastructure management takes place, the optimiser is based (cost-based), metadata management and security (authentication and access control) are handled.

Read on to learn about the other two layers and how they meet.

Comments closed

The Importance of a Proper Datamart / Data Warehouse

Teo Lachev explains why you want a datamart (or a data warehouse) for BI solutions:

I sent a proposal for implementing a classic BI solution: Azure SQL-based datamart (not Power BI datamart please), ETL, semantic model, and reports. The client had a sticker shock. Return to sender … as other BI companies that quoted can do it for half! Upon digging, it turned out the other companies would build the semantic model (aka Power BI dataset) directly on top of the data source. On a T&M basis, of course, what else? By contrast, I give fixed-price milestone-driven proposals and I don’t get paid unless I deliver and meet written and agreed upon success criteria, but that’s a different story.

So, let me count the ways as the poet would say. It’s certainly technically possible to slap a dataset on top of the data source(s). That’s what self-service BI is all about right … until it doesn’t serve anymore

Read on for more detail.

Comments closed