Thinking About The Data Lake

Ust Oldfield gives architectural hints on Azure Data Lake Store:

It is very easy to treat a data lake as a dumping ground for anything and everything. Microsoft’s sale pitch says exactly this – “Storage is cheap, Store everything!!”. We tend to agree – but if the data is completely malformed, inaccurate, out of date or completely unintelligible, then it’s no use at all and will confuse anyone trying to make sense of the data. This will essentially create a data swamp, which no one will want to go into. Bad data & poorly managed files erode trust in the lake as a source of information. Dumping is bad.

This is how you get data swamps (a term which I’m so happy is catching on).  Read the whole thing.

Related Posts

Azure Blob Storage and Data Lake Storage Gen2

Melissa Coates shows what you need to know about Azure Blob Storage with Azure Data Lake Storage Gen2: – You may need to consider separate storage accounts if you need to segregate access control (RBAC), virtual networks, access keys, and the like. (Note that RBAC can also be set at the container level too, but […]

Read More

Learning More About Azure Data Lake Storage Gen2

Melissa Coates shares a compendium of links pertaining to Azure Data Lake Storage Generation 2: A couple of people have asked me recently about how to ‘bone up’ on the new data lake service in Azure. The way I see it, there are two aspects: A, the technology itself and B, data lake principles and […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930