Working With Key-Value Pairs In Spark

Teena Vashist shows us a few of the functions available with Spark for working with key-value pairs:

1. Creating Key/Value Pair RDD: 
The pair RDD arranges the data of a row into two parts. The first part is the Key and the second part is the Value. In the below example, I used a parallelize method to create a RDD, and then I used the length method to create a Pair RDD. The key is the length of the each word and the value is the word itself.

scala> val rdd = sc.parallelize(List("hello","world","good","morning"))
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24
scala> val pairRdd = rdd.map(a => (a.length,a))
pairRdd: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[1] at map at <console>:26
scala> pairRdd.collect().foreach(println)
(5,hello)
(5,world)
(4,good)
(7,morning)

Click through for more operations.  Spark is a bit less KV-centric than classic MapReduce jobs, but there are still plenty of places where you want to use them.

Related Posts

What’s New In Ambari 2.7

Paul Codding and Kat Petre share some of the new features in Ambari 2.7: With this release, we wanted to make Ambari more enjoyable to use every day, simplify finding and using our API, and unblock teams managing very large clusters.  Here is a preview of a few features we’re excited to share with you:Revamped […]

Read More

Working With Images In Spark 2.4

Tomas Nykodym and Weichen Xu give us an update on working with images in the most recent version of Apache Spark: An image data source addresses many of these problems by providing the standard representation you can code against and abstracts from the details of a particular image representation.Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

December 2018
MTWTFSS
« Nov  
 12
3456789
10111213141516
17181920212223
24252627282930
31