Aki Ariga continues his sparklyr series with some analysis of US flight data:
In this post, we will show you a visualization and build a predictive model of US flights with sparklyr. Flight visualization code is based on this article.
This post assumes you already have the following tables:
- Airlines data as
airlines_bi_pq. It is assumed to be on S3, but you can put it into HDFS. See also the Ibis project.
- Airports data converted into Parquet format as
airports_new_pq. See also 2009 ASA Data Expo.
You should make these tables available through Apache Hive or Apache Impala (incubating) with Hue.
There’s some setup work to get this going, but getting a handle on sparklyr looks to be a good idea if you’re in the analytics space.