On Whether Relational Data Belongs In A Data Lake

Melissa Coates debates whether relational data really belongs in a data lake:

For certain types of data, writing it to the data lake really is frequently the best choice. This is often true for low latency IoT data, semi-structured data like logs, and varying structures such as social media data. However, the handling of structured data which originates from a relational database is much less clear.

Most data lake technologies store data as files (like csv, json, or parquet). This means that when we extract relational data into a file stored in a data lake, we lose valuable metadata from the database such as data types, constraints, foreign keys, etc. I tend to say that we “de-relationalize” data when we write it to a file in the data lake. If we’re going to turn right around and load that data to a relational database destination, is it the right call to write it out to a file in the data lake as an intermediary step?

Click through for considerations on both sides of the fence.

Related Posts

Azure Blob Storage and Data Lake Storage Gen2

Melissa Coates shows what you need to know about Azure Blob Storage with Azure Data Lake Storage Gen2: – You may need to consider separate storage accounts if you need to segregate access control (RBAC), virtual networks, access keys, and the like. (Note that RBAC can also be set at the container level too, but […]

Read More

Learning More About Azure Data Lake Storage Gen2

Melissa Coates shares a compendium of links pertaining to Azure Data Lake Storage Generation 2: A couple of people have asked me recently about how to ‘bone up’ on the new data lake service in Azure. The way I see it, there are two aspects: A, the technology itself and B, data lake principles and […]

Read More

Categories

November 2018
MTWTFSS
« Oct Dec »
 1234
567891011
12131415161718
19202122232425
2627282930