Data Lakes Aren’t New

Shannon Lowder reveals one of the deep, dark data lake secrets:

Turns out there are three basic zones or areas to a data lake. Raw, Managed, and Presentation.

The raw zone should be optimized for fast storage.  The goal is to get the data in as quickly as possible.  Don’t make any changes to this data.  You want it stored as close to the original format as possible.  It sounds just like staged data to me.  Data you’d build an extract package to get from source to your staging environment, right?

Maybe you’re thinking this is just a coincidence…let’s move on.

Spoilers:  it’s not a coincidence.

Related Posts

Using JSON In Azure Data Lake Analytics

Jeffrey Verheul shows how to register .NET assemblies in Azure Data Lake Analytics: The power of Azure Data Lake is that you can use a variety of different file types to process data (from Azure Data Lake Analytics). But in order to use JSON, you need to register some assemblies first. Downloading assemblies The assemblies […]

Read More

Moving Data Between Data Lakes

Jeffrey Verheul shows us how to use AdlCopy to migrate data from one Azure Data Lake to another: Migrating data from one Data Lake to the other We started out with a test version of a Data Lake, and this week I needed to migrate data to the production version of our Data Lake. After […]

Read More

Categories

October 2017
MTWTFSS
« Sep Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031