In Lieu Of Lambda Architecture, Using Faster Databases

Justin Langseth argues that the Lambda architecture is not really necessary if you are using the right data stores:

Basically, the idea is to keep the fast stuff fast and the slow stuff slow. I wrote a paper 14 years ago on the challenges of real-time data warehousing. Fortunately, both the data streaming, database, and BI layers have all evolved significantly since then, and now there exist databases and other data storage engines which can support the feature trinity that is needed to do both real-time and historical analytics right, without a Lambda architecture:

  1. Accept real-time streams of data at high rates.
  2. Simultaneously respond to large volumes of queries, including on the most recently added data.
  3. Store all the history needed for analysis.

We call these engines “fast data sinks” and there are four main groups of them today:

It’s an interesting argument.

Related Posts

Apache Avro 1.9.0 Released

Fokko Driesprong announces the release of Apache Avro 1.9.0: Avro is a remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. If you’re unfamiliar with Avro, I would highly recommend the explanation of Dennis Vriend […]

Read More

Temporal Tables with Flink

Marta Paes shows off a new feature in Apache Flink: In the 1.7 release, Flink has introduced the concept of temporal tables into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031