Summarizing Improvements In Spark 2.4

Anmol Sarna summarizes Apache Spark 2.4 and pushes his meme game at the same time:

The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.
Spark 2.4 introduced 24 new built-in functions, such as  array_unionarray_max/min, etc., and 5 higher-order functions, such as transformfilter, etc.
The entire list can be found here.
Earlier, for manipulating the complex types (e.g. array type) directly, there are two typical solutions:
1) exploding the nested structure into individual rows, and applying some functions, and then creating the structure again.
2) building a User Defined Function (UDF).
In contrast, the new built-in functions can directly manipulate complex types, and the higher-order functions can manipulate complex values with an anonymous lambda function similar to UDFs but with much better performance.

2.4 was a big release, so check this out for a great summary of the improvements it brings.

Related Posts

Databricks Runtime 5.5

Bilal Aslam and Yifan Cao announce Databricks Runtime 5.5: Secrets API in R notebooksThe Databricks Secrets API [Azure|AWS] lets you inject secrets into notebooks without hardcoding them. As of Databricks Runtime 5.5, this API is available in R notebooks in addition to existing support for Python and Scala notebooks. You can use the dbutils.secrets.get function to obtain […]

Read More

Hooking SQL Server to Kafka

Niels Berglund has an interesting scenario for us: We see how the procedure in Code Snippet 2 takes relevant gameplay details and inserts them into the dbo.tb_GamePlay table. In our scenario, we want to stream the individual gameplay events, but we cannot alter the services which generate the gameplay. We instead decide to generate the event from the database […]

Read More

Categories

December 2018
MTWTFSS
« Nov Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31