The Data Lake From 10,000 Feet

Pradeep Menon has a high-level explanation of what a data lake is and how it differs from traditional data warehouses:

With the changes in the data paradigm, a new architectural pattern has emerged. It’s called as the Data Lake Architecture. Like the water in the lake, data in a data lake is in the purest possible form. Like the lake, it caters to need to different people, those who want to fish or those who want to take a boat ride or those who want to get drinking water from it, a data lake architecture caters to multiple personas. It provides data scientists an avenue to explore data and create a hypothesis. It provides an avenue for business users to explore data. It provides an avenue for data analysts to analyze data and find patterns. It provides an avenue for reporting analysts to create reports and present to stakeholders.

The way I compare a data lake to a data warehouse or a mart is like this:

Data Lake stores data in the purest form caters to multiple stakeholders and can also be used to package data in a form that can be consumed by end-users. On the other hand, Data Warehouse is already distilled and packaged for defined purposes.

One way of thinking about this is that data warehouses are great for solving known business questions:  generating 10K reports or other regulatory compliance reporting, building the end-of-month data, and viewing standard KPIs.  By contrast, the data lake is (among other things) for spelunking, trying to answer those one-off questions people seem to have but which the warehouse never seems to have quite the right set of information.

Related Posts

On Whether Relational Data Belongs In A Data Lake

Melissa Coates debates whether relational data really belongs in a data lake: For certain types of data, writing it to the data lake really is frequently the best choice. This is often true for low latency IoT data, semi-structured data like logs, and varying structures such as social media data. However, the handling of structured […]

Read More

The Value Of Power BI Dataflows

Matt Allington gets to the core benefits of Power BI Dataflows: Dataflows are: An online service provided by Microsoft as part of Power BI (software as a service, or SaaS). In effect dataflows are an online data collection and storage tool. Collection:  It uses Power Query to connect to the data at the source and transform that data as […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031