Warehouses Will Live On

Jesse Seymour argues that in-memory analysis solutions will not entirely replace data warehouses:

The big reason that dimensional modeling increases clarity is that the dimensional model seeks to flatten data as much as possible.  Let’s compare two examples.  Both of these examples are for a fictional health clinic.

The first example is that we want a report on how many male patients were  treated with electric shock therapy by provider, grouped monthly and spanning year to date range.

Those big Kimball-style warehouses do a great job of making it easier for people who are not database specialists to query data and get meaningful, consistent results to known business questions.  The trick to understanding data platforms is that they tend to be complements rather than substitutes:  introducing Spark-R in your environment does not replace your Kimball-style warehouse; it complements it by letting analysts find trends more easily.  Similarly, a Hadoop cluster potentially lets you complement an existing data warehouse in a few ways:  acting as a data aggregator (which allows you to push some ETL work off onto the cluster), a data collector (especially for information which is useful but doesn’t really fit in your conformed warehouse), and a data processor (particularly for those gigantic queries which are not time-sensitive).

Related Posts

HDP 3.0 Updates To Hive And Druid

Nishant Bangarwa has some updates to Apache Druid in HDP 3.0: There are numerous improvements that went into HDP 3.0 and the performance improvements shown are an aggregate result of all of them. Here are some of the more noteworthy improvements related to Druid-Hive integration : Druid Expressions Support – HIVE-18893/ CALCITE-2170   added support for Druid expressions in Hive. […]

Read More

Redshift Architecture Performance Tips

John Ryan has a few hints to help us build speedy Redshift clusters: The Need to Vacuum As Redshift does not reclaim free space automatically, updates and delete operations can frequently lead to table growth. Equally, it’s important as new entries are added, that the data is maintained in a sorted sequence. The VACUUM command is […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031