Thinking About The Data Lake

James Serra explains at a high level what the data lake metaphor is and how it works:

The data lake introduces a new data analysis paradigm shift:

OLD WAY: Structure -> Ingest -> Analyze

NEW WAY: Ingest -> Analyze -> Structure

This allows you to avoid a lot of up-front work before you are able to analyze data. With the old way, you have to know the questions to ask. The new way supports situations when you don’t know the questions to ask.

This solves the two biggest reasons why many EDW projects fail:

Too much time spent modeling when you don’t know all of the questions your data needs to answer
Wasted time spent on ETL where the net effect is a star schema that doesn’t actually show value

There are some good details here. My addition would be to reiterate the importance of a good data governance policy.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31