Divyansh Jain shares a couple of tips when optimizing Apache Spark code:
1. Avoid UDFs. But why..?
Because internally, Catalyst doesn’t optimize and process UDFs at all, which results in losing the optimization level. Instead, try using SparkSql API to develop your application.
Click through for a demo and for the second tip.