Word Count In Spark 2.0

Kevin Feasel

2017-01-16

Spark

Anubhav Tarar has a word count app for Spark 2.0:

Now you have to perform the given steps:

  • Create a spark session from org.apache.spark.sql.sparksession api and specify your master and app name

  • Using the sparksession.read.txt method, read from the file wordcount.txt the return value of this method in a dataset. In case you don’t know what a data set looks like you can learn from this link.

  • Split this dataset of type string with white space and create a map which contains the occurence of each word in that data set.

  • Create a class prettyPrintMap for printing the result to console.

This Hello World app is a bit longer than the sheer minimum code necessary, as it includes a class for formatting results and some error handling.

Related Posts

ACID Transactions on Spark

Achilleus explains one of the big announcements at Spark+AI Summit 2019: Delta Lake is basically a compute layer that would sit on top of your existing On Prem HDFS cluster, your favourite Cloud storage or even run it locally on your laptop(Best part)! Data is stored on the above-mentioned storage as versioned Parquet files. Any data that is read using […]

Read More

Spark+AI Summit 2019 Announcements

Victoria Holt creates a roundup of Spark+AI Summit 2019 announcements: Rohan Kumar  of Microsoft announced .NET for Apache Spark, making Apache Spark accessible to .NET developers – Git Hub I’m very happy that the Spark for .NET team added F# support. Spark is so much nicer when using functional programming.

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031