Data Lake Planning

Melissa Coates discusses some of the planning involved with creating a data lake:

Does a Data Lake Replace a Data Warehouse?

I’m biased here, and a firm believer that modern data warehousing is still very important. Therefore, I believe that a data lake, in an of itself, doesn’t entirely replace the need for a data warehouse (or data marts) which contain cleansed data in a user-friendly format. The data warehouse doesn’t absolutely have to be in a relational database anymore, but it does need a semantic layer which is easy to work with that most business users can access for the most common reporting needs.

On this question, my answer is “Absolutely not.”  Data warehouses are designed to answer specific, known business questions.  They’re great for regulatory reporting, quarterly reports to shareholders, weekly reports to management, etc.  Data lakes are designed for ad hoc analysis of information.  Read the whole thing.

Related Posts

Data Lakes And Data Swamps

Randolph West talks about data lakes: Internet companies including search engines (Google, Bing), social media companies (Facebook, Twitter), and email providers (Yahoo!, Outlook.com) are managing data stores measured in petabytes. On a daily basis these organizations handle all sorts of structured and unstructured data. Assuming they put all their data in one repository, that could […]

Read More

Backing Up Azure Data Lake Store Data

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory: Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31