Justin Hayes gives us an overview of using Cloudera Data Platform for real-time data warehousing:
The simplest way to describe a RTDW is that it looks and feels like a normal data warehouse, but everything is faster even while massive scale is maintained. It is a type of data warehouse modernization that lets you have “small data” semantics and performance at “big data” scale.
– the data arrives into the warehouse faster – think streams of many millions of events per second constantly arriving
– the time it takes for the data to be optimally queryable is faster – query immediately upon arrival with no need for processing or aggregation or compaction
– the speed at which queries run is faster – small, selective queries are measured in 10s or 100s of milliseconds; large, scan- or compute-heavy queries are processed at very high bandwidth
– mutations of the data, when needed, are fast – if data needs to be corrected or updated for whatever reason, this can be done in place without large rewrites
Read on for more.