Data Lake Planning

Melissa Coates discusses some of the planning involved with creating a data lake:

Does a Data Lake Replace a Data Warehouse?

I’m biased here, and a firm believer that modern data warehousing is still very important. Therefore, I believe that a data lake, in an of itself, doesn’t entirely replace the need for a data warehouse (or data marts) which contain cleansed data in a user-friendly format. The data warehouse doesn’t absolutely have to be in a relational database anymore, but it does need a semantic layer which is easy to work with that most business users can access for the most common reporting needs.

On this question, my answer is “Absolutely not.”  Data warehouses are designed to answer specific, known business questions.  They’re great for regulatory reporting, quarterly reports to shareholders, weekly reports to management, etc.  Data lakes are designed for ad hoc analysis of information.  Read the whole thing.

Related Posts

Defining A Data Lake

Derik Hammer gives us a definition of the data lake: Data lake, a term originally coined by James Dixon, the founder and CTO of Pentaho, is used to describe a data store which can scale to extremely large sizes, in an affordable manner. A data lake is also designed to store the raw data, in […]

Read More

Comparing Data Lake Job Runs

Yanan Cai shows how to compare stats on different executions of a job: Troubleshooting issues in recurring job is a time-consuming task. It starts with searching through the Job Browser to find instances of a recurring job and identifying both baseline and anomalous performance. This is followed by multi-way comparisons between job instances to figure out what […]

Read More


October 2016
« Sep Nov »