reduceByKey and aggregateByKey in Spark

Published 2021-04-21 by Kevin Feasel

The Hadoop in Real World team compares two functions against RDDs in Spark:

Let’s examine the below aggregateByKey. The first parameter – 0 is the initial value and also indicates the type of the output.
First _+_ function indicates the function on the map side combine and second _+_ function indicates the reduce side combine. Both functions are the same in this case.

This is a demo-driven post, so check it out.

Published in Hadoop and Spark

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30