Crime Analysis

Kevin Feasel


R, Spark

Raghavan Madabusi combines Zeppelin, R, and Spark to perform crime analysis:

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and R.

This links to a rather long post on how to set up and use all of these pieces.  I’m more familiar with Jupyter than Zeppelin, but regardless of the notebook you choose, this is a good exercise to become familiar with the process.

Related Posts

Polishing Uncalibrated Models

Nina Zumel takes an uncalibrated random forest model and applies a calibration technique to improve the estimate on one variable: In the previous article in this series, we showed that common ensemble models like random forest and gradient boosting are uncalibrated: they are not guaranteed to estimate aggregates or rollups of the data in an unbiased way. […]

Read More

Generating Excel Spreadsheets from Shiny

Richard Hill and Andy Merlino show how you can export data from a Shiny app into Excel: R is great for report generation. Shiny allows us to easily create web apps that generate a variety of reports with R. This post details a demo Shiny app that generates an Excel report, a PowerPoint report, and a PDF […]

Read More


May 2016
« Apr Jun »