Miles Cole does a bit of testing:
First, let’s revisit the purpose of the benchmark: The objective is to explore data engineering engines available in Fabric to understand whether Spark with vectorized execution (the Native Execution Engine) should be considered in small data architectures.
Beyond refreshing the benchmark to see if any core findings have changed, I do want to expand in a few areas where I got great feedback from the community:
I really appreciate the approach behind this, both in terms of sticking to more realistic data sizes for many operations as well as performing this test given all of the recent improvements in each engine.