Press "Enter" to skip to content

What is Parquet and Why Use It?

The folks at Jumping Rivers explain what the Parquet file format is and how you can use it in R:

Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension .parquet. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data.

Read on for that explanation and plenty of sample code.