Press "Enter" to skip to content

CSV Data Ingestion with Spark

Jean Georges Perrin shows how you can easily load CSV data with Spark:

Fortunately for you, Apache Spark offers a variety of options for ingesting those CSV files. Ingesting CSV is easy and schema inference is a powerful feature.
Let’s have a look at more advanced examples with more options that illustrate the complexity of CSV files in the outside world. You’ll first look at the file you’ll ingest, and understand its specifications. You’ll then have a look at the result and finally build the mini-application to achieve the result. This pattern repeats for each format.

It’s good to see some of the lesser-used features pop up like date format and multi-line support (which I hadn’t even known about).