Defining A Data Lake

Derik Hammer gives us a definition of the data lake:

Data lake, a term originally coined by James Dixon, the founder and CTO of Pentaho, is used to describe a data store which can scale to extremely large sizes, in an affordable manner. A data lake is also designed to store the raw data, in its original format, so it can be used immediately, rather than waiting weeks for the IT department to massage it into a format that the data warehouse can accept and/or use effectively.

The data lake concept always includes the capability to scale to an enormous size. However, you do not need petabytes of data to find use in a data lake. It can be used as cheap storage for long-term archival data. It can be used to transform data before attempting to ingest into a data warehouse with the convenience of retaining the original and transformed versions of the data. It also can be used as the centralized staging location for ingestion into the data warehouse, simplifying the loading processes.

I would like to take this opportunity to remind readers that the Aristotelian opposite of the Data Lake is the Data Swamp.  Derik uses this term as well and it makes me feel warm and fuzzy inside to see broad adoption of this term.

Related Posts

Azure Data Lake Storage Generation 2

James Baker announces updates to Azure Data Lake Storage Gen2: As we’ve discussed many times, the performance of the storage layer has an outsized impact on the total cost of ownership (TCO) for your complete analytics pipeline. This is due to the fact that every percentage point improvement in storage performance results in that same […]

Read More

When Structured Data Makes Sense In A Data Lake

James Serra ponders when it makes sense to add structured data to a data lake: I would not say it’s common place to load structured data into the data lake, but I do see it frequently. In most cases it is not necessary to first copy relational source data into the data lake and then […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031