Press "Enter" to skip to content

Data Lake File Formats and Security

Ashish Kumar and Jorge Villamariona continue a series on data lakes:

People from a traditional RDBMS background are often surprised at the extraordinary amount of control that data lake architects have over how datasets can be stored. Data Lake Architects, as opposed to the Relational Database Administrators, get to determine an array of elements such as file sizes, type of storage (row vs. columnar), degrees of compression, indexing, schemas, and block sizes. These are related to the big data oriented ecosystem of formats commonly used for storing and accessing information in a data lake.

It is a bit of a different world and it comes with trade-offs. The whole thing is worth reading.