Press "Enter" to skip to content

Set Operations In Spark

Fisseha Berhane compares SparkSQL, DataFrames, and classic RDDs when performing certain set-based operations:

In this fourth part, we will see set operators in Spark the RDD way, the DataFrame way and the SparkSQL way.
Also, check out my other recent blog posts on Spark on Analyzing the Bible and the Quran using Spark and Spark DataFrames: Exploring Chicago Crimes.

The data and the notebooks can be downloaded from my GitHub repository.
The three types of set operators in RDD, DataFrame and SQL approach are shown below.

This is where SparkSQL (and SQL in general) shines, although the DataFrame approach is also compact.