Pivoting With Spark SQL

MaryAnn Xue shows us how to use the PIVOT operator in Spark SQL:

Pivot was first introduced in Apache Spark 1.6 as a new DataFrame feature that allows users to rotate a table-valued expression by turning the unique values from one column into individual columns.

The upcoming Apache Spark 2.4 release extends this powerful functionality of pivoting data to our SQL users as well. In this blog, using temperatures recordings in Seattle, we’ll show how we can use this common SQL Pivot feature to achieve complex data transformations.

The syntax is quite similar to the PIVOT syntax that SQL Server uses.

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Categories

November 2018
MTWTFSS
« Oct Dec »
 1234
567891011
12131415161718
19202122232425
2627282930