Data Lakes Aren’t New

Shannon Lowder reveals one of the deep, dark data lake secrets:

Turns out there are three basic zones or areas to a data lake. Raw, Managed, and Presentation.

The raw zone should be optimized for fast storage.  The goal is to get the data in as quickly as possible.  Don’t make any changes to this data.  You want it stored as close to the original format as possible.  It sounds just like staged data to me.  Data you’d build an extract package to get from source to your staging environment, right?

Maybe you’re thinking this is just a coincidence…let’s move on.

Spoilers:  it’s not a coincidence.

Related Posts

Data Classifications on Azure SQL DW

Meagan Longoria takes us through data classifications on Azure SQL Data Warehouse: Data classifications in Azure SQL DW entered public preview in March 2019. They allow you to label columns in your data warehouse with their information type and sensitivity level. There are built-in classifications, but you can also add custom classifications. This could be an important […]

Read More

Populating a Data Vault Model with Azure Data Factory

Rayis Imayev gives us an example of ELT into a Data Vault model using Azure Data Factory: To make a full transition from the existing  DW model to an alternative Data Vault I removed all Surrogate Keys and other attributes that are only necessary to support Kimball data warehouse methodology. Also, I needed to add […]

Read More

Categories

October 2017
MTWTFSS
« Sep Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031