The folks at Jumping Rivers explain what the Parquet file format is and how you can use it in R:
Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension
.parquet
. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data.
Read on for that explanation and plenty of sample code.