Aki Ariga continues his sparklyr series with some analysis of US flight data:
In this post, we will show you a visualization and build a predictive model of US flights with sparklyr. Flight visualization code is based on this article.
This post assumes you already have the following tables:
- Airlines data as
airlines_bi_pq
. It is assumed to be on S3, but you can put it into HDFS. See also the Ibis project.- Airports data converted into Parquet format as
airports_new_pq
. See also 2009 ASA Data Expo.You should make these tables available through Apache Hive or Apache Impala (incubating) with Hue.
There’s some setup work to get this going, but getting a handle on sparklyr looks to be a good idea if you’re in the analytics space.