Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension
.parquet. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data.
Read on for that explanation and plenty of sample code.