Defining A Data Lake

Derik Hammer gives us a definition of the data lake:

Data lake, a term originally coined by James Dixon, the founder and CTO of Pentaho, is used to describe a data store which can scale to extremely large sizes, in an affordable manner. A data lake is also designed to store the raw data, in its original format, so it can be used immediately, rather than waiting weeks for the IT department to massage it into a format that the data warehouse can accept and/or use effectively.

The data lake concept always includes the capability to scale to an enormous size. However, you do not need petabytes of data to find use in a data lake. It can be used as cheap storage for long-term archival data. It can be used to transform data before attempting to ingest into a data warehouse with the convenience of retaining the original and transformed versions of the data. It also can be used as the centralized staging location for ingestion into the data warehouse, simplifying the loading processes.

I would like to take this opportunity to remind readers that the Aristotelian opposite of the Data Lake is the Data Swamp.  Derik uses this term as well and it makes me feel warm and fuzzy inside to see broad adoption of this term.

Related Posts

Comparing Data Lake Job Runs

Yanan Cai shows how to compare stats on different executions of a job: Troubleshooting issues in recurring job is a time-consuming task. It starts with searching through the Job Browser to find instances of a recurring job and identifying both baseline and anomalous performance. This is followed by multi-way comparisons between job instances to figure out what […]

Read More

Data Lake Zones

Melissa Coates walks us through the different layers of a data lake: As we are approaching the end of 2017, many people have resolutions or goals for the new year. How about a goal to get organized…in your data lake? The most important aspect of organizing a data lake is optimal data retrieval. Click through for […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031