Microsoft R Server On Spark

Kevin Feasel

2016-08-10

Cloud, R, Spark

Max Kaznady, et al, discuss using Microsoft R Server on Spark to perform rapid prototyping against the NYC Taxi dataset:

Once the cluster is created, you can connect to the edge node where MRS is already pre-installed by SSHing to r-server.YOURCLUSTERNAME-ssh.azurehdinsight.net with the credentials which you supplied during the cluster creation process. In order to do this in MobaXterm, you can go to Sessions, then New Sessions and then SSH.

The default installation of HDI Spark on Linux cluster does not come with RStudio Server installed on the edge node. RStudio Server is a popular open source integrated development environment (IDE) available for R that provides a browser-based IDE for use by remote clients. This tool allows you to benefit from all the power of R, Spark and Microsoft HDInsight cluster through your browser. In order to install RStudio you can follow the steps detailed in the guide, which reduces to running a script on the edge node.

If you’ve been meaning to get further into Spark & R, this is a great article to follow along with on your own.

Related Posts

Stream-To-Stream Joins In Spark

Ayush Tiwari shows how to join a pair of streams in Apache Spark 2.3: In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to see how beautifully spark now give support for joining the two streaming dataframes. I this example, I […]

Read More

Probabilities And Poker

Steve Miller has a notebook on 5-card draw probabilities: The population of 5 card draw hands, consisting of 52 choose 5 or 2598960 elements, is pretty straightforward both mathematically and statistically. So of course ever the geek, I just had to attempt to show her how probability and statistics converge. In addition to explaining the […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031