Getting Started With Zeppelin

Sangeeta Gulia shows us how to get started building notebooks with Apache Zeppelin on top of Spark:

There are 3 interpreter modes available in Zeppelin.

1) Shared Mode

In Shared mode, a SparkContext and a Scala REPL is being shared among all interpreters in the group. So every Note will be sharing single SparkContext and single Scala REPL. In this mode, if NoteA defines variable ‘a’ then NoteB not only able to read variable ‘a’ but also able to override the variable.

2) Scoped Mode

In Scoped mode, each Note has its own Scala REPL. So variable defined in a Note can not be read or overridden in another Note. However, still single SparkContext serves all the Interpreter Groups. And all the jobs are submitted to this SparkContext and fair scheduler schedules the job. This could be useful when user does not want to share Scala session, but want to keep single Spark application and leverage its fair scheduler.

3) Isolated Mode

In Isolated mode, each Note has its own SparkContext and Scala REPL.

The default mode of %spark interpreter is ‘Globally Shared’.

This is mostly a step-by-step on installing Zeppelin, but does go into some detail on how Zeppelin works.

Related Posts

Defending Pie Charts

Bobby Johnson makes a valiant effort at defending the indefensible: In the world of data analysis, there are few things more reviled than the pie chart. Among “serious” data people, it is at best trivial and naive, and at worst downright evil. I do not agree with this. The pie chart is simple, but that […]

Read More

Cassandra To Kafka Connect

Mike Barlotta shows how to feed data into Kafka from Cassandra via Kafka Connect.  Part one involves basic setup: Modeling data in Cassandra must be done around the queries that are needed to access the data (see this article for details). Typically this means that there will be one table for each query and data (in our […]

Read More


November 2017
« Oct Dec »