Thinking About The Data Lake

Ust Oldfield gives architectural hints on Azure Data Lake Store:

It is very easy to treat a data lake as a dumping ground for anything and everything. Microsoft’s sale pitch says exactly this – “Storage is cheap, Store everything!!”. We tend to agree – but if the data is completely malformed, inaccurate, out of date or completely unintelligible, then it’s no use at all and will confuse anyone trying to make sense of the data. This will essentially create a data swamp, which no one will want to go into. Bad data & poorly managed files erode trust in the lake as a source of information. Dumping is bad.

This is how you get data swamps (a term which I’m so happy is catching on).  Read the whole thing.

Related Posts

Azure Data Lake Storage Generation 2

James Baker announces updates to Azure Data Lake Storage Gen2: As we’ve discussed many times, the performance of the storage layer has an outsized impact on the total cost of ownership (TCO) for your complete analytics pipeline. This is due to the fact that every percentage point improvement in storage performance results in that same […]

Read More

When Structured Data Makes Sense In A Data Lake

James Serra ponders when it makes sense to add structured data to a data lake: I would not say it’s common place to load structured data into the data lake, but I do see it frequently. In most cases it is not necessary to first copy relational source data into the data lake and then […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930