Euan Garden shares some numbers around Apache Spark performance in Azure Synapse Analytics:
To compare the performance, we derived queries from TPC-DS with 1TB scale and ran them on 8 nodes Azure E8V3 cluster (15 executors – 28g memory, 4 cores). Even though our version running inside Azure Synapse today is a derivative of Apache Spark™ 2.4.4, we compared it with the latest open-source release of Apache Spark™ 3.0.1 and saw Azure Synapse was 2x faster in total runtime for the Test-DS comparison.
Click through for several techniques the Azure Synapse Analytics team has implemented to make some significant performance improvements. It’s still slower than Databricks, but considerably faster than the open-source Apache Spark baseline.