Praveen Sripati shows how to use Spark Dataframes to convert a CSV file into a Parquet format:
In the previous blog, we looked at on converting the CSV format into Parquet format using Hive. It was a matter of creating a regular table, map it to the CSV data and finally move the data from the regular table to the Parquet table using the Insert Overwrite syntax. In this blog we will look at how to do the same thing with Spark using the dataframes feature.
Most of the code is basic setup; writing to Parquet is really a one-liner.