Word Count In Spark 2.0

Kevin Feasel

2017-01-16

Spark

Anubhav Tarar has a word count app for Spark 2.0:

Now you have to perform the given steps:

  • Create a spark session from org.apache.spark.sql.sparksession api and specify your master and app name

  • Using the sparksession.read.txt method, read from the file wordcount.txt the return value of this method in a dataset. In case you don’t know what a data set looks like you can learn from this link.

  • Split this dataset of type string with white space and create a map which contains the occurence of each word in that data set.

  • Create a class prettyPrintMap for printing the result to console.

This Hello World app is a bit longer than the sheer minimum code necessary, as it includes a class for formatting results and some error handling.

Related Posts

Looking At Databricks Cluster Pricing

Tristan Robinson takes a look at Azure Databricks pricing: The use of databricks for data engineering or data analytics workloads is becoming more prevalent as the platform grows, and has made its way into most of our recent modern data architecture proposals – whether that be PaaS warehouses, or data science platforms. To run any […]

Read More

Data Science And Data Engineering In HDP 3.0

Saumitra Buragohain, et al, show off some of the things added to the Hortonworks Data Platform for data scientists and data engineers: We leverage the power of HDP 3.0 from efficient storage (erasure coding), GPU pooling to containerized TensorFlow and Zeppelin to enable this use case. We will the save the details for a different […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031