Anuj Saxena takes us through some of the pros and cons of using the Catalyst Optimizer in Spark, including a couple of issues:
I am sure the optimizations make the calculation time very short and these optimizations are implemented in such a way that you just have to provide the logic and everything else will be done in abstraction. But as my friend and colleague Ramandeep says “Abstract features come with abstract issues”. So following are the few issues which I have faced in my recent interaction with Spark SQL:
1. Too large of a query to be stored in memory
2. Implicit optimizations interfere with partitioning
Click through for examples of this.