Data Lakes Aren’t New

Shannon Lowder reveals one of the deep, dark data lake secrets:

Turns out there are three basic zones or areas to a data lake. Raw, Managed, and Presentation.

The raw zone should be optimized for fast storage.  The goal is to get the data in as quickly as possible.  Don’t make any changes to this data.  You want it stored as close to the original format as possible.  It sounds just like staged data to me.  Data you’d build an extract package to get from source to your staging environment, right?

Maybe you’re thinking this is just a coincidence…let’s move on.

Spoilers:  it’s not a coincidence.

Related Posts

Backing Up Azure Data Lake Store Data

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory: Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop […]

Read More

Alerting On Azure Data Lake Store Data Usage

Jose Lara shows off an interesting feature in Azure Data Lake Store: The massive scale and capabilities of Azure Data Lake Store are regularly used by companies for big data storage. As the number of files, file types, and folders grow, things get harder to manage and staying compliant becomes a greater challenge for companies. […]

Read More

Categories

October 2017
MTWTFSS
« Sep Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031