Turns out there are three basic zones or areas to a data lake. Raw, Managed, and Presentation.
The raw zone should be optimized for fast storage. The goal is to get the data in as quickly as possible. Don’t make any changes to this data. You want it stored as close to the original format as possible. It sounds just like staged data to me. Data you’d build an extract package to get from source to your staging environment, right?
Maybe you’re thinking this is just a coincidence…let’s move on.
Spoilers: it’s not a coincidence.