Pivoting Spark DataFrames

Unmesha Sreeveni shows how we can pivot a DataFrame in Apache Spark using one line of code:

A pivot can be thought of as translating rows into columns while applying one or more aggregations.

Lets see how we can achieve the same using the above dataframe.

We will pivot the data based on “Item” column.

Click through for the code. This is an area where dropping back into Scala or Python is a lot more lines-of-code efficient than sticking to SQL.

Related Posts

Controlling Partition and File Counts in Spark

Landon Robinson shows how we can control the number of partitions (and therefore the number of output files) on reduce-style jobs in Spark: Whatever the case may be, the desire to control the number of files for a job or query is reasonable – within, ahem, reason – and in general is not too complicated. And, it’s often […]

Read More

Creating an Azure Databricks Cluster

Brad Llewellyn shows how you can create an Azure Databricks cluster: There are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data.  We will dig into each of these in due time.  For this post, we’re going to talk about Clusters.  Clusters are where the work is done.  Clusters themselves […]

Read More

Categories

April 2019
MTWTFSS
« Mar May »
1234567
891011121314
15161718192021
22232425262728
2930