Basically, the idea is to keep the fast stuff fast and the slow stuff slow. I wrote a paper 14 years ago on the challenges of real-time data warehousing. Fortunately, both the data streaming, database, and BI layers have all evolved significantly since then, and now there exist databases and other data storage engines which can support the feature trinity that is needed to do both real-time and historical analytics right, without a Lambda architecture:
- Accept real-time streams of data at high rates.
- Simultaneously respond to large volumes of queries, including on the most recently added data.
- Store all the history needed for analysis.
We call these engines “fast data sinks” and there are four main groups of them today:
It’s an interesting argument.