Press "Enter" to skip to content

Category: Architecture

A Primer on Apache Cassandra Reads and Writes

Utkarsh Upadhyay explains some of the internals of reading and writing with Apache Cassandra:

Apache Cassandra is a type of No-SQL database. It handles large amounts of data across many commodity servers. Being a highly scalable and high-performance distributed database, it provides high availability with no single point of failure. Here in this blog, mainly I focused on Reads and writes in Cassandra. And For Cassandra architecture, you can refer to this blog Apache Casandra: Back to Basics. So let’s get started with this blog on Apache Cassandra: Reads and Writes.

Click through for a comparison between Cassandra and MySQL, followed by a high-level architectural explanation of read and write operations in Cassandra. Though one thing which raises my eyebrow is the statement that reads in Cassandra are O(1). I don’t know that not to be the case, but I’m inclined to say it doesn’t sound right.

1 Comment

Multi-Cloud Pros and Cons

James Serra lays out some of the benefits and drawbacks of using multiple cloud providers:

A discussion I have seen many companies have is if they should be single-cloud (using only one cloud company) or multi-cloud (using more than one cloud company). The three major Cloud Service Providers (CSPs) that companies use for nearly all use cases are Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP).

Without spoiling it too much, James is not really sold on the idea.

Comments closed

The Basics of Event-Driven Architecture

The Aiven team has a nice primer on event-driven architecture:

What happens when one link in the chain goes down? Requests that are waiting for a response don’t receive one at all. They continue to wait, or they time out. The entire application is blocked. What’s more, as the number of services increases, the number of synchronous interactions between them increases as well. In such a situation, a single system’s downtime affects the availability of other systems as well.

An alternative approach is building a microservices application on an event-driven architecture (EDA). Event-driven architecture is made up of decoupled components — producers and consumers — which process events asynchronously, often working through an intermediary, called a broker. That might feel like a mouthful. Don’t worry — we’re going to walk through these concepts one step at a time. In this article, we’re going to look at the components that make up event-driven architecture, why you would use this paradigm, and how to implement it.

Read on to see what makes it so interesting.

Comments closed

Data Mesh and Ownership Strategies

James Serra aims to clear up some confusion:

I have done a ton of research lately on Data Mesh (see the excellent Building a successful Data Mesh – More than just a technology initiative for more details), and have some concerns about the paradigm shift it requires. My last blog tackled the one about Centralized vs decentralized data architecture. In this one I want to talk about centralized ownership vs decentralized ownership, along with another paradigm shift (or core principle) closely related to it, siloed data engineering teams vs cross-functional data domain teams.

First I wanted to mention there is a Data Mesh Learning slack channel that I have spent a lot of time reading and what is apparent is there is a lot of confusion on exactly what a data mesh is and how to build it. I see this as a major problem as the more difficult it is to explain a concept the more difficult it will be for companies to successfully build that concept, so the promise of a data mesh improving the failure rates for big data projects will be difficult to achieve if we can’t all agree exactly what a data mesh is. What’s more is the core principles of the data mesh sound great in theory but will have challenges in implementing them, hence my thoughts in this blog on centralized ownership vs decentralized ownership.

Read on for James’s take on the matter.

Comments closed

Centralized and Decentralized Data Architectures

James Serra looks at a pattern:

A centralized data architecture means the data from each domain/subject (i.e. payroll, operations, finance) is copied to one location (i.e. a data lake under one storage account), and that the data from the multiple domains/subjects are combined to create centralized data models and unified views. It also means centralized ownership of the data (usually IT). This is the approach used by a Data Fabric.

A decentralized distributed data architecture means the data from each domain is not copied but rather kept within the domain (each domain/subject has its own data lake under one storage account) and each domain has its own data models. It also means distributed ownership of the data, with each domain having its own owner.

So is decentralized better than centralized?

Read on for James’s answer, and allow me to include a Dilbert cartoon so old, the boss didn’t even have pointy hair yet.

How Decentralized Organizations Can be Effective | The Fourth Revolution  Blog

Comments closed

Centralized Data Modeling via Power BI Templates

Haroon Ashraf aims to square the circle:

Data modeling is the way you can arrange and link your organizational data (typically in the form of tables) for reporting and analysis.

In other words, it is the strategy of lining tables with each other to get useful information by following the standard practices and domain knowledge of the organization.

Traditionally, it stands for implementing the star or snowflake schema from the perspective of the data warehouse BI solution.

What is Centralized Data Modeling?

Centralized data modeling means a generic data model consisting of some commonly used tables, relationships, and hierarchies that are shared across the organization. These elements the starting point for Power BI report development to anyone eligible, interested, and capable to do so.

With that in mind, read on to learn how you can use Power BI templates to bring this about. I joke about squaring the circle here because if you treat Power BI as a self-service business intelligence tool, the users may not be totally familiar with what you’re doing and could end up accidentally undermining your plans. That said, it’s a good approach to solving this common problem.

Comments closed

Superkeys

Kevin Wilkie knows that not all keys wear capes:

Well, that’s because we sometimes need different ways to describe what we’ve got going on. Of the four different types of keys we’ve discussed so far, they are all different enough that we need to differentiate them and be able to explain what the differences are. For the rest of the keys that we’ll go through today, the same idea exists. They are close to the others but different enough that there is a need for another name for that type of key.

There is a super key.

Read on to learn what a superkey is. That will put you one quarter of the way to understanding Boyce-Codd Normal Form: a relational variable is in Boyce-Codd Normal Form if and only if all functional dependencies have superkey determinants.

Comments closed

Delivering Data Insights using the Microsoft Data Platform

Paul Andrew has a talk:

Let’s start with a story, not a ‘once upon a time story‘, a story for your backlog 

As a solution architect
I need to design and build an Azure data analytics platform end to end
to deliver data insights for my customer.

In February 2021 I delivered a talk as part of the Scottish Summit conference on how you could/should build an end to end data platform solution in Azure to deliver data insights and analytics. This is one of my favourite sessions so thought it worth re-sharing the recording here.

Click through for the abstract as well as the video.

Comments closed

Soft Deletes in SQL Server

Erik Darling has some thoughts on soft deletes:

Implementing soft deletes for an app that’s been around for a while can be tough. In the same way as implementing Partitioning can be tough to add in later to get data management value from (rebuilding clustered indexes on the scheme, making sure all nonclustered indexes are aligned, and all future indexes are too, and making sure you have sufficient partitions at the beginning and end for data movement).

Read the whole thing. Incidentally, this also ties well into a recent post by Erik about deleting into a different table. It can be easier to implement soft deletes as deleting from the current table and adding to an archive table. That gives you the benefits of keeping deleted data while not running into some of the problems Erik mentions. And if you want to undo a deletion? Delete from the archive table and insert back into the main table.

Comments closed