Press "Enter" to skip to content

Real-Time Data Warehousing in Cloudera

Justin Hayes gives us an overview of using Cloudera Data Platform for real-time data warehousing:

The simplest way to describe a RTDW is that it looks and feels like a normal data warehouse, but everything is faster even while massive scale is maintained. It is a type of data warehouse modernization that lets you have “small data” semantics and performance at “big data” scale.

– the data arrives into the warehouse faster – think streams of many millions of events per second constantly arriving

– the time it takes for the data to be optimally queryable is faster – query immediately upon arrival with no need for processing or aggregation or compaction

– the speed at which queries run is faster – small, selective queries are measured in 10s or 100s of milliseconds; large, scan- or compute-heavy queries are processed at very high bandwidth

– mutations of the data, when needed, are fast – if data needs to be corrected or updated for whatever reason, this can be done in place without large rewrites

Read on for more.