Lambda Architecture In Azure

Jared Zagelbaum describes the Lambda architecture pattern and explains how you can use tooling in Azure to implement it:

Lambda is an organic result of the limitations of existing tools. Distributed systems architects and developers commonly criticize its complexity – and rightly so. Those of us that have worked extensively in Extract-Transform-Load and symmetric multiprocessing systems see red flags when code is replicated in multiple services. Ensuring data quality and code conformity across multiple systems, whether massively parallel processing (MPP) or symmetrically parallel system (SMP), has the same best practice: the least amount of times you reproduce code is always the correct number of times.

We reproduce code in lambda because different services in MPP systems are better at different tasks. The maturity of tools historically hasn’t allowed us to process streams and batch in a single tool. This is starting to change, with Apache Spark emerging as a single preferred compute service for stream and batch querying, hence the timing of Azure Databricks. However, on the storage side, what was meant to be an immutable store that is the data lake in practice, can become the dreaded swamp when governance or testing fails; which is not uncommon. A fundamentally different assumption to how we process data is required to combat this degradation. Enter: the kappa architecture, which we’ll examine in the next post of this series.

Interesting reading.

Related Posts

Jupyter Notebooks In Azure

Steve Jones looks at using Jupyter Notebooks in Azure: There’s a new feature in Azure, and I stumbled on it when someone posted a link on Twitter. Apologies, I can’t remember who, but I did click on the Azure Notebooks link and was intrigued. I’ve gotten Jupyter notebooks running on my local laptop, but these are often […]

Read More

Uploading Data Sets To Azure ML From R

Leila Etaati continues her series on the Azure ML R package by showing how to upload a data set: There is a function in AzureML package name “workspace” that creates a reference to an AzureML Studio workspace by getting the authentication token and workspace id as below: 1 ws <– workspace( id , auth  ) to […]

Read More


January 2018
« Dec Feb »