Gaurav Mishra takes us through data lake architecture on AWS:
Landing zone: This is the area where all the raw data comes in, from all the different sources within the enterprise. The zone is strictly meant for data ingestion, and no modelling or extraction should be done at this stage.
Curation zone: Here’s where you get to play with the data. The entire extract-transform-load (ETL) process takes place at this stage, where the data is crawled to understand what it is and how it might be useful. The creation of metadata, or applying different modelling techniques to it to find potential uses, is all done here.
Production zone: This is where your data is ready to be consumed into different application, or to be accessed by different personas.
This is a nice overview of data lake concepts and worth the read if you’re using AWS. Even if not, the same principles (if not the same technologies) apply for Azure, other clouds, and on-prem.
Comments closed