Submitting A Spark Job On HDInsight

Bharath Venkatesh shows different ways to run a Spark job on HDInsight:

From HDI 3.5 onwards, our clusters come preinstalled with Zeppelin Notebooks. Much like Jupyter notebooks, Zeppelin is a web-based notebook that enables interactive data analytics. It provides built-in Spark intergration that allows for:

  • Automatic SparkContext and SQLContext injection
  • Runtime jar dependency loading from local filesystem or maven repository. Learn more about dependency loader.
  • Canceling job and displaying its progress

This MSDN article provides a quick easy-to-use onboarding guide to help get acclimatized to Zeppelin. You can also try several applications that come pre-installed on your cluster to get hands on experience of Zeppelin.

Zeppelin is probably my favorite method, but there are good reasons to use all of these.

Related Posts

Azure Databricks And Active Directory

Tristan Robinson wraps up a two-parter on Azure Databricks security: With the addition of Databricks runtime 5.1 which was released December 2018, comes the ability to use Azure AD credential pass-through. This is a huge step forward since there is no longer a need to control user permissions through Databricks Groups / Bash and then […]

Read More

Recreating Dropped Azure SQL Managed Instance DBs

Jovan Popovic has a script to re-create an Azure SQL Managed Instance database which you might accidentally have dropped: Azure SQL Database – Managed Instance is fully-managed PaaS service that provides advanced disaster-recovery capabilities. Even if you accidentally drop the database or someone drops your database as part of security attack, Managed Instance will enable […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031