Press "Enter" to skip to content

RDD vs Dataframe vs Dataset in Spark

The Hadoop in Real World team disambiguates three APIs:

RDD, Dataframe and Dataset are all Spark APIs introduced in Spark at different points in time. The goal of these API is to help us work with large datasets in a distributed fashion in Spark with performance in mind.

Click through for the comparison.