Press "Enter" to skip to content

Category: Architecture

Kafka Control and Data Planes

Sanjay Garde explains how the architecture of Apache Kafka solutions has expanded over time:

With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.

Read on to see how the concept works at an architectural level.

Comments closed

Recommendations for Dedicated SQL Pool Data Modeling

Bhaskar Sharma has some advice:

In this article, I will discuss how to physically model an Azure Synapse Analytics data warehouse while migrating from an existing on-premises MPP (Massive Parallel Processing) data warehouse solution like Teradata and Netezza. The approach and methodologies discussed in this article are purely based on the knowledge and insight I have gained while migrating these data warehouses to Azure Synapse dedicated SQL pool. 

Dedicated SQL pools are close enough to regular SQL Server that we make a lot of assumptions about it, some of which may be wrong.

Comments closed

The Power of Metadata-Driven Development

Koen Verbeeck lays out a recommendation:

In this blog post I’ll talk about another of those rules/mantras/patterns/maxims:

build once, add metadata

I’m not sure if I’m using the right words, I heard something similar in a session by Spark enthusiast Simon Whiteley. He said you should only write code once, but make it flexible and parameterized, so you can add functionality just by adding metadata somewhere. A good example of this pattern can be found in Azure Data Factory; by using parameterized datasets, you can build one flexible pipeline that can copy for example any flat file, doesn’t matter which columns it has. I have blogged about this:

Click through to learn more about the concept, as well as some tips on how you’d do that in various data movement products (e.g., SSIS, ADF, Logic Apps).

Comments closed

The Importance of Star Schemas in Power BI

Paul Turley lays out facts (and dimensions):

There is no secret about this. If you do any legitimate research about Power BI (reading blogs, books or training from reliable sources), you will quickly learn that a lot of basic functionality requires a dimensional model, aka “Star Schema”. This is a hard fact that every expert promotes, and self-taught data analysts either have learned or will learn through experience. So, if everyone agrees on this point, why do so many resist this advice?

Perspective is everything. I didn’t understand why getting to the star schema was so out of reach so often until I was able to see it from another perspective. There are a few common scenarios that draw source data into different directions than an ideal dimensional model.

Read on for Paul’s take on the subject.

Comments closed

Reviewing Database Usage Trends

Brendan Tierney looks at the data:

Getting back to the topic of this post, I’ve gathered some data and obtained some league tables from some sites. These will help to have a closer look at what is really happening in the Database market throughout 2022. Two popular sites who constantly monitor the wider internet and judge how popular Databases area globally. These sites are DB-Engines and TOPDB Top Database index. These are well know and are frequently cited. Both of these sites give some details of how they calculate their scores, with one focused mainly on how common the Database appears in searches across different search engines, while the other one, in addition to search engine results/searches, also looks across different websites, discussion forms, social media, job vacancies, etc.

I don’t necessarily believe that these are totally accurate, though on the whole, I do expect the results to be directionally accurate. I’ve used DB-Engines data several times in the past and like to point out that, for any given year, 7 or 8 of the top 10 database engines are relational.

Comments closed

Consistency Levels in Cassandra

Dmytro Kostenko enumerates some options:

In Cassandra, a consistency level is the number of replicas responding before returning a reply to a user. Consistency in Cassandra is tunable, meaning that each client can consider what level of consistency and availability to choose. Moreover, it is assigned at the query level and can be configured for different service components. Users can choose different consistency levels for each operation, both for reads and writes. While choosing the consistency level for your operation, you should understand each level’s tradeoff between consistency and availability. Cassandra’s consistency can be strong or weak, depending on your chosen level.

Read on to learn more about strong vs weak consistency in the context of Cassandra, as well as the consistency level options available to us.

Comments closed

Use Cases for Multiple Data Lakes

James Serra explains why you might want multiple data lakes in an organization:

A question I get asked frequently from customers when discussing Data lake architecture is “Should I use one data lake for all my data, or multiple lakes?”. Ideally, you would use just one data lake, but I have seen many valid use cases where customers are using multiple data lakes. Here are some of those reasons:

I’d quibble with a couple of these (and given James’s intro, I’m not sure he’s fully on board with all of the reasons) but this is a good list of reasons why you might see several data lakes in an organization.

Comments closed

Well-Architected Framework for Oracle in Azure

Kellyn Pot’vin-Gorman has a new tool for us:

This invaluable framework provides clear guidance on the recommended practices to assess, architect and migrate Oracle workloads to the Azure cloud.  This should be the first place for answers to success for Oracle on Azure!

A special thanks to my teammate, Jessica Haessler for working so hard to help me get this to the finish line, as I would have never been able to get this done on my own!  

Click through for a link to the guide. There isn’t a Well-Architected Framework assessment for this yet but the WAF articles themselves have quite a bit of detail to them.

Comments closed

Storing Semi-Additive Facts as Timespans

Timo Zishiri gives a new spin to a common warehousing problem:

In these cases, the measure may be aggregated across dates by averaging over the number of periods, e.g., average daily inventory levels. Measures can also be aggregated across dates by taking the maximum/minimum for the time interval.

More specifically, this blog focuses on an alternative approach to providing end users with the ability to do point-in-time analysis, so-called trend analysis.

Click through to see how a timespan table would work.

Comments closed

The Importance of Proper Data Modeling in Power BI

Paul Turley avoids “big, wide tables”:

Power BI is architected to consume data in a dimensional model, with narrow fact tables and related dimensions. Introducing a big, wide table in a tabular model is extremely inefficient. It takes up space and memory resources, impacts performance, and complicates measure coding. Flattening records into a flat table is one of the worst things you can do in Power BI and a common mistake made by novice Power BI users.

This is a conversation I’ve had with many customers. We want our cake, and we want to eat it too. We want to have all the analytic capabilities, interactivity and high performance but we also want the ability to drill-down to a lot of details. What if we have a legitimate need to report on transaction details and/or a large table with many columns? It is well-known that the ideal shape is a star schema but what if we need to shape data for detail reporting? The answer is that you can have it both ways, but just not in one table.

Read on for a better model design (hint: the Kimball style) as well as several tips and tricks.

Comments closed