Defining A Data Lake

Derik Hammer gives us a definition of the data lake:

Data lake, a term originally coined by James Dixon, the founder and CTO of Pentaho, is used to describe a data store which can scale to extremely large sizes, in an affordable manner. A data lake is also designed to store the raw data, in its original format, so it can be used immediately, rather than waiting weeks for the IT department to massage it into a format that the data warehouse can accept and/or use effectively.

The data lake concept always includes the capability to scale to an enormous size. However, you do not need petabytes of data to find use in a data lake. It can be used as cheap storage for long-term archival data. It can be used to transform data before attempting to ingest into a data warehouse with the convenience of retaining the original and transformed versions of the data. It also can be used as the centralized staging location for ingestion into the data warehouse, simplifying the loading processes.

I would like to take this opportunity to remind readers that the Aristotelian opposite of the Data Lake is the Data Swamp.  Derik uses this term as well and it makes me feel warm and fuzzy inside to see broad adoption of this term.

Related Posts

Azure Data Lake Analytics Updates

Michael Rys has a boatload of new updates for Azure Data Lake: The top items include expanding our built-in support for standard file formats with native Parquet support for extractors and outputters (in public preview) and ORC (in private preview)! In addition, since the fast file set feature now has been generally released, we can consume hundreds of thousands of such […]

Read More

Using AU Analyzer To Lower Data Lake Analytics Costs

Matthew Hicks shows off the Data Lake Analytics AU Analyzer: The AU Analyzer looks at all the vertices (or nodes) in your job, analyzes how long they ran and their dependencies, then models how long the job might run if a certain number of vertices could run at the same time. Each vertex may have […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031