Lambda Architecture

Koos van Strien looks at lambda architecture and asks if it works for data warehouses:

The Lambda Architecture is pretty well documented – online1 as well as in the book I just mentioned2. For a quick overview, Lambda Architecture is basically a system where the raw data is always stored, and never thrown away. All information that’s derived from this raw data is always recomputed – often stated as query = function(all data). This provides for a fool-proof architecture that’s rigorously simple (compared to classic RDBMS solutions), made up of three layers:

Admittedly, about half of this went over my head, but there are some good book and webpage recommendations to learn more about lambda architecture and Data Vault.

Related Posts

HDP 3.0 Updates To Hive And Druid

Nishant Bangarwa has some updates to Apache Druid in HDP 3.0: There are numerous improvements that went into HDP 3.0 and the performance improvements shown are an aggregate result of all of them. Here are some of the more noteworthy improvements related to Druid-Hive integration : Druid Expressions Support – HIVE-18893/ CALCITE-2170   added support for Druid expressions in Hive. […]

Read More

Redshift Architecture Performance Tips

John Ryan has a few hints to help us build speedy Redshift clusters: The Need to Vacuum As Redshift does not reclaim free space automatically, updates and delete operations can frequently lead to table growth. Equally, it’s important as new entries are added, that the data is maintained in a sorted sequence. The VACUUM command is […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031