Tungsten Engine

Kevin Feasel



Sameer Agarwal, Davies Liu, and Reynold Xin show off major Spark engine improvements:

From the above observation, a natural next step for us was to explore the possibility of automatically generating this handwritten code at runtime, which we are calling “whole-stage code generation.” This idea is inspired by Thomas Neumann’s seminal VLDB 2011 paper onEfficiently Compiling Efficient Query Plans for Modern Hardware. For more details on the paper, Adrian Colyer has coordinated with us to publish a review on The Morning Paper blog today.

The goal is to leverage whole-stage code generation so the engine can achieve the performance of hand-written code, yet provide the functionality of a general purpose engine. Rather than relying on operators for processing data at runtime, these operators together generate code at runtime and collapse each fragment of the query, where possible, into a single function and execute that generated code instead.

The possibility of getting an order of magnitude better performance is certainly enticing.

Related Posts

Apache Spark 2.3

The Databricks team has been busy.  They’ve recently announced Apache Spark 2.3 on Databricks: Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.3 marks a major milestone for Structured Streaming by introducing low-latency continuous processing and stream-to-stream joins; boosts PySpark by improving performance with pandas UDFs; and runs on Kubernetes clusters […]

Read More

Using Kafka And Elasticsearch For IoT Data

Angelos Petheriotis talks about building an IoT structure which handles ten billion messages per day: We splitted the pipeline into 2 main units: The aggregator job and the persisting job. The aggregator has one and only one responsibility. To read from the input kafka topic, process the messages and finally emit them to a new […]

Read More


May 2016
« Apr Jun »