Maximum Temperatures With Spark Languages

Kevin Feasel

2016-11-04

Spark

Praveen Sripati has a two-part series on getting aggregates by year in various Spark languages.  In part one, he looks at Python:

Hadoop – The Definitive Guide revolves around the example of finding the maximum temperature for a particular year from the weather data set. The code for the same is here and the data here. Below is the Spark code implemented in Python for the same.

In part 2, he looks at Spark SQL:

In the previous blog, we looked at how find out the maximum temperature of each year from the weather dataset. Below is the code for the same using Spark SQL which is a layer on top of Spark. SQL on Spark was supported using Shark which is being replaced by Spark SQL.Here is a nice blog from DataBricks on the future of SQL on Spark.

There’s no Scala example here, but it’s pretty straightforward as well.

Related Posts

Using Sqoop’s Logic To Improve Spark JDBC Performance

Avi Yehuda analyzes how Sqoop works to make relational database access from Spark faster: Sqoop performed so much better almost instantly, all you needed to do is to set the number of mappers according to the size of the data and it was working perfectly.Since both Spark and Sqoop are based on the Hadoop map-reduce […]

Read More

Summarizing Improvements In Spark 2.4

Anmol Sarna summarizes Apache Spark 2.4 and pushes his meme game at the same time: The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.Spark 2.4 introduced 24 new built-in functions, such as  array_union, array_max/min, etc., and 5 higher-order functions, such as transform, filter, etc.The entire […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930