The Data Lake From 10,000 Feet

Pradeep Menon has a high-level explanation of what a data lake is and how it differs from traditional data warehouses:

With the changes in the data paradigm, a new architectural pattern has emerged. It’s called as the Data Lake Architecture. Like the water in the lake, data in a data lake is in the purest possible form. Like the lake, it caters to need to different people, those who want to fish or those who want to take a boat ride or those who want to get drinking water from it, a data lake architecture caters to multiple personas. It provides data scientists an avenue to explore data and create a hypothesis. It provides an avenue for business users to explore data. It provides an avenue for data analysts to analyze data and find patterns. It provides an avenue for reporting analysts to create reports and present to stakeholders.

The way I compare a data lake to a data warehouse or a mart is like this:

Data Lake stores data in the purest form caters to multiple stakeholders and can also be used to package data in a form that can be consumed by end-users. On the other hand, Data Warehouse is already distilled and packaged for defined purposes.

One way of thinking about this is that data warehouses are great for solving known business questions:  generating 10K reports or other regulatory compliance reporting, building the end-of-month data, and viewing standard KPIs.  By contrast, the data lake is (among other things) for spelunking, trying to answer those one-off questions people seem to have but which the warehouse never seems to have quite the right set of information.

Related Posts

Rant: NoSQL Isn’t A Thing

Grant Fritchey is in rare form today: Go and search through it for a NoSQL data management system. I’ll wait. You found one that was named NoSQL didn’t you. Oracle NoSQL, because Oracle. Of course. However, under the Database Model what did it say? Document Store. Why? BECAUSE NOSQL IS NOT A DATA STORAGE ENGINE! […]

Read More

Query Lables In Azure SQL Data Warehouse

Arun Sirpal demonstrates how to use query labels in Azure SQL Data Warehouse: Using a query label in Azure SQL DW (Data Warehouse) can be a really handy technique to track queries via DMVs. You might want to do this to see what problematic queries are doing under the covers. Let’s check out an example. […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031