SparkR + Zeppelin

I take a look at using SparkR and Zeppelin:

My goal is to do some of the things that I did in my Touching on Advanced Topics post.  Originally, I wanted to replicate that analysis in its entirety using Zeppelin, but this proved to be pretty difficult, for reasons that I mention below.  As a result, I was only able to do some—but not all—of the anticipated work.  I think a more seasoned R / SparkR practitioner could do what I wanted, but that’s not me, at least not today.

With that in mind, let’s start messing around.

SparkR is a bit of a mindset change from traditional R.

Related Posts

Beware Multi-Assignment dplyr::mutate() Statements

John Mount hits on an issue when using dplyr backed by a database in R: Notice the above gives an incorrect result: all of the x_i columns are identical, and all of the y_i columns are identical. I am not saying the above code is in any way desirable (though something like it does arise naturally in certain test […]

Read More

Leveraging Hive In Pyspark

Fisseha Berhane shows how to use Spark to connect Python to Hive: If we are using earlier Spark versions, we have to use HiveContext which is variant of Spark SQL that integrates with data stored in Hive. Even when we do not have an existing Hive deployment, we can still enable Hive support. In this […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031