Spark Data Structures

Kevin Feasel

2017-07-31

Spark

Shubham Agarwal explains the difference between three Spark data structures:

DataFrame(DF) – 

DataFrame is an abstraction which gives a schema view of data. Which means it gives us a view of data as columns with column name and types info, We can think data in data frame like a table in the database.

Like RDD, execution in Dataframe too is lazy triggered.

Read on to learn more about Resilient Distributed Datasets, DataFrames, and DataSets.

Related Posts

Installing Apache Mesos On EC2

Anubhav Tarar has a guide for setting up Apache Mesos along with Spark and Hadoop on EC2: Apache Mesos is open source project for managing computer clusters originally developed at the University Of California. It sits between the application layer and operating system to manage the application works efficiently on the large-scale distributed environment. In […]

Read More

PySpark DataFrame Transformations

Vincent-Philippe Lauzon shows how to perform data frame transformations using PySpark: We wanted to look at some more Data Frames, with a bigger data set, more precisely some transformation techniques.  We often say that most of the leg work in Machine learning in data cleansing.  Similarly we can affirm that the clever & insightful aggregation query […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31