Simplifying Spark Application Development

Ian Hellstrom has scripts to simplify Apache Spark application rollout:

When creating Apache Spark applications the basic structure is pretty much the same: for sbt you need the same build.sbt, the same imports, and the skeleton application looks the same. All that really changes is the main entry point, that is the fully qualified class. Since that’s easy to automate, I present a couple of shell scripts that help you create the basic building blocks to kick-start Spark application development and allow you to easily upgrade versions in the configuration.

Check these out if you’re interested in Spark.

Related Posts

Tips For Using PolyBase With Cloudera QuickStart VM

I have a post on using Cloudera’s QuickStart VM with PolyBase: Here’s something which tripped me up a little bit while connecting to Cloudera using SQL Server. The data node name, instead of being quickstart.cloudera like the host name, is actually localhost. You can change this in /etc/cloudera-scm-agent/config.ini. Because PolyBase needs to have direct access to the data nodes, […]

Read More

Bayesian Modeling Of Hardware Failure Rates

Sean Owen shows how you can use Bayesian statistical approaches with Spark Streaming, using the example of hard drive failure rates: This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031