Press "Enter" to skip to content

Category: Architecture

Delta Versus Lambda Architectures

Hector Leano compares the delta and lambda architectures:

Generally, a simple data architecture is preferable to a complex one. Code complexity increases points of failure, requires more compute to run jobs, adds latency, and increases the need for support. As a result, data pipeline performance degrades over time, increasing costs while decreasing productivity as your data engineers spend more time troubleshooting and downstream users wait longer for data refreshes.

Complexity was perceived as a necessary evil for the automated data pipelines feeding business reporting, SQL analytics, and data science because the traditional approach for bringing together batch and streaming data required a lambda architecture. While a lambda architecture can handle large volumes of batch and streaming data, it increases complexity by requiring different code bases for batch and streaming, along with its tendency to cause data loss and corruption. In response to these data reliability issues, the traditional data pipeline architecture adds even more complexity by adding steps like validation, reprocessing for job failures, and manual update & merge.

On the one hand, lambda was always intended to be a compromise architecture based on the tools of the time. On the other hand, take this with as many grains of salt as you need given that the post comes from the primary company responsible for delta.

Comments closed

Scaling ksqlDB, with Animations

Michael Drogalis walks us through scaling models with ksqlDB:

Software engineering memes are in vogue, and nothing is more fashionable than joking about how complicated distributed systems can be. Despite the ribbing, many people adopt them. Why? Distributed systems give us two things their single node counterparts cannot: scale and fault tolerance.

ksqlDB, the event streaming database, is built with a client/server architecture. You can run it with a single server, or you can cluster many servers together. Part 1 and part 2 in this series explained how a single server executes stateless and stateful operations. This post is about how these work when ksqlDB is deployed with many servers, and more importantly how it linearly scales the work it is performing—even in the presence of faults.

If you like, you can follow along by executing the example code yourself. ksqlDB’s quickstart makes it easy to get up and running.

Click through for well-animated examples.

Comments closed

PASS Summit Q&A: The Curated Data Platform

I answer some questions:

On Thursday, I presented a session at PASS Summit entitled The Curated Data Platform. You can grab slides and links to additional information on my website. Thank you to everyone who attended the session.

During and after the session, I had a few questions come in from the audience, and I wanted to cover them here.

Most of the questions were around document databases, so check them out.

Comments closed

Tips for Moving to a Multi-Tenant Setup

Adrian Hills continues a series on multi-tenant SQL Server:

What you knew a few years ago might differ significantly from the reality today, whether you started off with a single tenant system that you pivoted quickly to support multiple tenants, or you envisioned 10s of tenants and ended up with 1000s. Whatever that reality is, when you experience pain points around the 3 considerations I covered in part 1 of this series (security, maintainability, and scalability), it can lead to a need to change the multi-tenancy approach you’re using. Often, the biggest driver for change is around performance and scalability and typically tends to be related to a need to move from a less-isolated multi-tenancy approach (single database) to a more-isolated approach that supports the scaling out of workloads (multiple databases).

In the steps below, I’ll cover the general path you can follow to successfully make an architectural change like this to the database layer. As a provider of database performance monitoring and DataOps tools, SentryOne offers some tools that can come in handy along the way, so I’ll call those out as we go.

Click through for guidance.

Comments closed

Multi-Tenant Database Designs

Adrian Hills walks us through four database designs for multi-tenant data storage:

In my previous blog post, I talked about some of the key considerations around designing a multi-tenant system using SQL Server. There are several ways to implement multi-tenancy, and, as is often the case, there is no single “best” way but rather a range of options that each offer different trade-offs. The approach that is right for you depends on your objectives and needs for your specific environment. It’s important to consider which of these approaches best suit your requirements and goals based on the 3 core considerations from Multi-Tenancy with SQL Server, Part 1: security, maintainability (manageability), and scalability.

The following are the 4 approaches I will cover in this blog post:
1. Single database, shared schema
2. Single database, separate schema
3. Database per tenant
4. Multiple databases, multiple tenants per database, shared schema

I’ve worked with options 1, 3, and 4. Read on for Adrian’s thoughts. Ceteris paribus, my preference is 3. That said, I’ve worked in a situation where I migrated from 3 to 1 because there were thousands of customers, none of whom had more than hundreds of megabytes worth of data. 4 provides a good balance in that fashion, where you can bunch up smaller clients and give larger clients their own databases (and sometimes even servers). But if you’re going with options 2, 3, and 4, you probably want a central data warehouse which collects data across all four for internal use.

Comments closed

Power BI Icons for Diagrams.Net

Marc Lelijveld has some icons for us:

Previously, I used a simple PowerPoint slide when I drafted technical solution proposals. This took me a whole lot of time by copy-pasting all the images, make it look nice and connect the dots together. While tools like diagrams.net are built for this purpose, I always stuck with PowerPoint as there were no icons for all Power BI objects in this tool. Until now!

The online tool Diagrams.net allow you to quickly draft your solution architecture by dragging and dropping icons on a white canvas and easily connecting the dots together.

I’ve been a big fan of diagrams.net (nee draw.io), so thank you Marc for putting this together.

Comments closed

The Downside of EAV-Style Measures in Power BI

Chris Webb explains why you should try to stick to the fact-dimensional model in Power BI:

In this fact table the dimension keys remain the same, but the Value column stores all the data from the Sales, Tax and Volume Sold measures in the original table and the Measure Name column tells you what type of measure value is stored on any given row. Let’s call this approach the Measures Dimension approach.

There are some advantages to building fact tables using the Measures Dimension approach, for example:

– You can now use a slicer in a report to select the measures that appear in a visual
– You can now easily add new measures without having to add new columns in your fact table
– You can use row-level security to control which measures a user has access to

Generally speaking, though, any time you deviate from a conventional dimensional model you risk running into problems later on and this is no exception. Let’s go through the disadvantages of modelling data using a Measures Dimension.

Read on for several good reasons (and yes, “things are formatted wrong” is a good reason!).

Comments closed

Azure Icons in Diagrams.Net

Dave Ruijter walks us through using Azure icons in diagrams.net (nee draw.io):

How to use the icon collection in draw.io (diagrams.net)

I’ve made it easy for you: I’ve created a couple of custom libraries with all the icons! I’ve uploaded them to my GitHub, and you can use them in a couple of ways.

The first and most simple option is to just start the diagrams.net app with an URL with the correct libraries included as URL parameters:

Click through to read the whole thing. I’m a long-time fan of diagrams.net and am glad to see Dave’s offering for us.

1 Comment

ksqlDB 0.11.0

Victoria Xia announces ksqlDB 0.11.0:

ksqlDB 0.11.0 contains improvements and fixes spanning stranded transient queriesoverly aggressive schema compatibility checksconfusing behavior around casting nullsbad schema management, and more. Here, we highlight a couple of additional, notable improvements.

Also on my backlog was Andy Coates, talking about key columns in ksqlDB:

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately end with ksqlDB supporting multiple key columns and multiple key formats, including Avro, JSON, and Protobuf.

Before looking at the syntax changes in version 0.10, let’s first look at what is meant by keys in ksqlDB, the two types of key columns, and how this may differ from other SQL systems.

Read on, as it’s an interesting look at how different data architectures can mean radically different recommendations for key design.

Comments closed

Using the Azure Architecture Icons

Steve Jones tries out some of the Azure Architecture Icons:

The icons are svg, so while they work in PowerPoint, adding them to something like this post in OpenLiveWriter doesn’t work. However, I could make a quick diagram and capture an image of it.

Not great, but it shows I can put icons on a page with arrows.

Going one step further, I’ve been digging into Diagrams by mingrammer lately. With it, you use Python to generate diagrams, and there are quite a few Azure icons in there, as well as AWS, on-prem, etc.

Here’s a quick example of what you can do, taken from an upcoming talk of mine:

There are some limitations based on the underlying library, such as how you can’t connect cluster to cluster—meaning I can’t draw a line from “Logging” to “Storage\Logs”; I have to draw it from a particular element (Loki) to a particular element (Elasticsearch). In a lot of traditional reference architecture diagrams, though, that isn’t a problem.

Comments closed