Submitting A Spark Job On HDInsight

Bharath Venkatesh shows different ways to run a Spark job on HDInsight:

From HDI 3.5 onwards, our clusters come preinstalled with Zeppelin Notebooks. Much like Jupyter notebooks, Zeppelin is a web-based notebook that enables interactive data analytics. It provides built-in Spark intergration that allows for:

  • Automatic SparkContext and SQLContext injection
  • Runtime jar dependency loading from local filesystem or maven repository. Learn more about dependency loader.
  • Canceling job and displaying its progress

This MSDN article provides a quick easy-to-use onboarding guide to help get acclimatized to Zeppelin. You can also try several applications that come pre-installed on your cluster to get hands on experience of Zeppelin.

Zeppelin is probably my favorite method, but there are good reasons to use all of these.

Related Posts

Mounting HDFS As A Local Filesystem

Guy Shilo looks at two techniques for mounting HDFS as a local filesystem: NFS Gateway is a HDFS component that enables the use to expose HDFS through NFS3 interface so that Linux machines can mount it and access it just as a local filesystem. The manual installation is quite cumbersome and is covered here. Cloudera manager […]

Read More

How Humio Uses Kafka

Kresten Krab describes ways that Humio uses Apache Kafka for their product: Humio is a log analytics system built to run both on-prem and as a hosted offering. It is designed for “on-prem first” because, in many logging use cases, you need the privacy and security of managing your own logging solution. And because volume […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031