Summarizing Improvements In Spark 2.4

Anmol Sarna summarizes Apache Spark 2.4 and pushes his meme game at the same time:

The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as  array_unionarray_max/min, etc., and 5 higher-order functions, such as transformfilter, etc.
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.

2.4 was a big release, so check this out for a great summary of the improvements it brings.

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Categories

December 2018
MTWTFSS
« Nov Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31