Word Count In Spark 2.0

Kevin Feasel

2017-01-16

Spark

Anubhav Tarar has a word count app for Spark 2.0:

Now you have to perform the given steps:

  • Create a spark session from org.apache.spark.sql.sparksession api and specify your master and app name

  • Using the sparksession.read.txt method, read from the file wordcount.txt the return value of this method in a dataset. In case you don’t know what a data set looks like you can learn from this link.

  • Split this dataset of type string with white space and create a map which contains the occurence of each word in that data set.

  • Create a class prettyPrintMap for printing the result to console.

This Hello World app is a bit longer than the sheer minimum code necessary, as it includes a class for formatting results and some error handling.

Related Posts

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Sharing R Notebooks

Hanyu Cui and Hossein Falaki show how to share a notebook using RMarkdown: RMarkdown is the dynamic document format RStudio uses. It is normal Markdown plus embedded R (or any other language) code that can be executed to produce outputs, including tables and charts, within the document. Hence, after changing your R code, you can just rerun all […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031