Category: Architecture

Data Architecture Questions to Ask

Published 2022-07-27 by Kevin Feasel

As an example, if I were asked what product to use to store data in the Azure cloud, I could come up with at least a dozen options, so I need to ask questions to reduce the choices to the best use case for the customers situation. This will avoid what I have seen many times – a company chooses a particular product and after their solution is built, they say the product is “terrible”, but they were using it for a use case that it was not designed for. But the customer was not aware of a better product for their use case because “they don’t know what they don’t know”. That is why you should work with an architect expert as one of your first order of business: the technology decisions at this early part of building a solution are vital to get correct, as finding out 6-months or one year later that you made the wrong choice and have to start over can lead to so much wasted time and money (and I have seen some shocking waste).

Read on for a slew of questions.

Comments closed

Starting a Data Mesh Project

Published 2022-07-26 by Kevin Feasel

Paul Andrew continues a series on data mesh:

A common question I get asked a lot when creating a data mesh architecture is where to start? The consultant in me defaults the answer to ‘it depends’, of course
However, in this blog post I want to give a better answer based on my experience of working with various customers so far. As always, the usual caveats apply, I’m happy to go first when trying to define a starting point for our data mesh delivery and fully accept that parts of this are probably wrong. This is also founded in the knowledge that every customer I’ve worked with is different, with different priorities and very subjective views on why they even need a data mesh architecture. Not to mention various levels of data platform maturity.

Paul also includes some nice roadmap and architectural box-drawing diagrams, so check those out.

Comments closed

Azure SQL Database and the Well-Architected Framework

Published 2022-07-21 by Kevin Feasel

Jason Bouska has a big announcement:

Microsoft Azure SQL Database is a fully managed cloud database (PaaS) that handles many database management tasks without user intervention. Tasks such as patching, upgrading, taking backups, and monitoring can be configured to the specific needs of the workload and are performed in the background. Azure SQL Database runs the latest stable version of SQL Server and patched OS with 99.99% availability. The intelligent automated functions built into the database free up the user to focus on other important tasks.
Today I am introducing the Azure Well-Architected Service Guide for Azure SQL Database. Like other service guides, this guide for Azure SQL Database contains design considerations, checklists, and detailed configuration recommendations that can assist cloud architects in deploying optimal Azure SQL workloads in line with the guiding tenets of the Well-Architected Framework: security, reliability, cost management, performance efficiency, and operational excellence.

I’ve found that the Well-Architected Framework (whose overloaded acronym is still annoying) works best once you’re far enough along that you have a good idea of workload characteristics, meaning it’s not for the pre-planning state. Also, a full review might take hours or days and require several people to complete, not just a DBA.

Comments closed

The Basics of Snowflake Architecture

Published 2022-07-20 by Kevin Feasel

Arun Sirpal lays out the foundation of Snowflake DB’s architecture:

At the most basic level, Snowflake has 3 important components. The Cloud services layer, centralised storage layer and the compute layer.
Cloud services – they call this the “brains” of snowflake. This is where infrastructure management takes place, the optimiser is based (cost-based), metadata management and security (authentication and access control) are handled.

Read on to learn about the other two layers and how they meet.

Comments closed

The Importance of a Proper Datamart / Data Warehouse

Published 2022-07-19 by Kevin Feasel

Teo Lachev explains why you want a datamart (or a data warehouse) for BI solutions:

I sent a proposal for implementing a classic BI solution: Azure SQL-based datamart (not Power BI datamart please), ETL, semantic model, and reports. The client had a sticker shock. Return to sender … as other BI companies that quoted can do it for half! Upon digging, it turned out the other companies would build the semantic model (aka Power BI dataset) directly on top of the data source. On a T&M basis, of course, what else? By contrast, I give fixed-price milestone-driven proposals and I don’t get paid unless I deliver and meet written and agreed upon success criteria, but that’s a different story.
So, let me count the ways as the poet would say. It’s certainly technically possible to slap a dataset on top of the data source(s). That’s what self-service BI is all about right … until it doesn’t serve anymore

Read on for more detail.

Comments closed

Proofs of Concept and Pilots

Published 2022-07-15 by Kevin Feasel

Kenneth Fisher strikes a chord:

If your POC does not follow your companies best practices and standards then it is not a valid POC.
There are way to many settings that will change it’s performance, cause security issues, etc. On top of that, almost every POC I’ve ever seen ends up becoming the test environment if not the actual production environment. So all of those little compromises end up in your actual, non POC environment because it’s way too much work to fix them now. You should have said something when we set this up.

To use one of my favorite lines, “Short answer: yes with an if; long answer: no with a but.”

Before I get to the answer, I do want to differentiate between a proof of concept and a pilot. The idea of a proof of concept is to see if I can make this thing work. Can I get these two processes to talk to each other? Can I build a website which accepts user input and displays something? Can I get this idea from my head into code? Can I process 500,000 records per second using our existing hardware? One important thing about a proof of concept is that it always has the possibility of failure. “No” is a valid answer here based on the conditions. By contrast, a pilot is a starter for the full project. You might work with one business unit instead of all of them, migrate a small amount of traffic to the new system, or only handle data from a single branch office. Also, you want that answer as fast as is reasonable so that your business decision-makers can make business decisions on that information. By contrast, when we do a pilot, we already know the answer is yes; we just need to build it out and answer the technical details along the way.

Returning to the line above: Yes, I agree with Kenneth if your company lacks the discipline to differentiate between proofs of concept and pilots (and that’s not as denigrating a remark as it sounds…though it’s somewhat denigrating). No, do not follow the same practices for a proof of concept that you would for a full product, but you need to ensure that code gets destroyed and you start over with new code which does follow those practices.

2 Comments

Have One Data Model per Business Area

Published 2022-07-15 by Kevin Feasel

James McGillivray offers us an important piece of advice:

I cannot stress this enough. If people are consuming your data in multiple places, the data needs to come from the same data model. That can be an Enterprise Data Warehouse, a Data Mart, a Power BI Model, or any other data source, but at some point you need to be able to track the data back to a single place. If you don’t do this, you will spend THE REST OF YOUR DAYS explaining the differences between the data models to business and customers, and reconciling the differences over and over again.

Read on to learn why this is so important.

Comments closed

Parameterizing Queries with Amazon Athena

Published 2022-07-12 by Kevin Feasel

Blayze Stefaniak, et al, architect a service to provide data via Amazon Athena:

Customers tell us they are finding new ways to make effective use of their data assets by providing data as a service (DaaS). In this post, we share a sample architecture using parameterized queries applied in the form of a DaaS application. This is helpful for many types of organizations, whether you’re working with an enterprise making data available to other lines of business, a regulator making reports available to your industry, a company monetizing your data assets, an independent software vendor (ISV) enabling your applications’ tenants to query their data when they need it, or trying to share data at scale in other ways. In DaaS applications, you can provide predefined queries to run against your governed datasets with values your users input. You can expand your DaaS application to break away from monolithic data infrastructure by treating data as a product (DaaP) and providing a distribution of datasets, which have distinct domain-specific data pipelines. You can authorize these datasets to consumers in your DaaS application permissions. You can use Athena parameterized queries as a way to predefine your queries, which you can use to run queries across your datasets, and serve as a layer of protection for your DaaS applications. This post first describes how parameterized queries work, then applies parameterized queries in the form of a DaaS application.

Click through to learn how.

Comments closed

Searching Industry Templates for Lake Databases in Synapse

Published 2022-07-11 by Kevin Feasel

Lakshmi Murthy is just browsing:

With Azure Synapse Database Templates generally available, our customers are constantly wanting to see and learn more about how to use these templates. Through these blogs we want to share tips and tricks our customers can use to help them utilize these templates in an efficient way. We’ve recently received several questions around the different ways a user can navigate these templates to create their lake databases. In this blog, I’d like to walk through a few options that may come handy as you give database templates a try.
Azure Synapse Analytics offers a no-code database designer which allows you to browse these database templates, select and customize the tables you want to use, to model your enterprise data. There are several ways to browse the tables provided by the comprehensive industry templates within the designer’s exploration experience. Though the user experience is super intuitive, there are a few tips and tricks that can make this process even easier. Let’s do a quick tour to learn about the different ways to browse these templates.

Click through for a few different ways to look at standard tables for different industries.

Comments closed

Guidance on When to Use Azure Data Explorer

Published 2022-06-13 by Kevin Feasel

Tzvia Gitlin Troyna has a flow chart for us:

Azure Data Explorer is a big data interactive analytics platform that empowers people to make data driven decisions in a highly agile environment. The factors listed below can help assess if Azure Data Explorer is a good fit for the workload at hand. These are the key questions to ask yourself.
The following flowchart table summarize the key questions to ask when you’re considering using Azure Data Explorer.

In addition to the flow chart, there is a table of three common patterns of interaction which ADE can do well.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30