Spark Data Structures

Kevin Feasel

2017-07-31

Spark

Shubham Agarwal explains the difference between three Spark data structures:

DataFrame(DF) – 

DataFrame is an abstraction which gives a schema view of data. Which means it gives us a view of data as columns with column name and types info, We can think data in data frame like a table in the database.

Like RDD, execution in Dataframe too is lazy triggered.

Read on to learn more about Resilient Distributed Datasets, DataFrames, and DataSets.

Related Posts

Working With Images In Spark 2.4

Tomas Nykodym and Weichen Xu give us an update on working with images in the most recent version of Apache Spark: An image data source addresses many of these problems by providing the standard representation you can code against and abstracts from the details of a particular image representation.Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post […]

Read More

Using Databricks Delta In Lieu Of Lambda Architecture

Jose Mendes contrasts the Lambda architecture with the Databricks Delta architecture and gives us a quick example of using Databricks Delta: The major problem of the Lambda architecture is that we have to build two separate pipelines, which can be very complex, and, ultimately, difficult to combine the processing of batch and real-time data, however, […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31