Partition Handling In Spark 2.1

Eric Liang, et al, discuss a change to Spark 2.1 which will make certain partitioned table access faster:

In Spark 2.1, we drastically improve the initial latency of queries that touch a small fraction of table partitions. In some cases, queries that took tens of minutes on a fresh Spark cluster now execute in seconds. Our improvements cut down on table memory overheads, and make the SQL experience starting cold comparable to that on a “hot” cluster with table metadata fully cached in memory.

This looks like a nice improvement in Spark.

Related Posts

SparkSession Versus SparkContext

Abhishek Baranwal explains the differences between the SparkSession object and the SparkContext object when writing Spark code: Prior to spark 2.0, SparkContext was used as a channel to access all spark functionality. The spark driver program uses sparkContext to connect to the cluster through resource manager. SparkConf is required to create the spark context object, […]

Read More

Hadoop + SQL Server In 2019

Travis Wright shows off a big part of what the SQL Server team has been working on the last couple of years: SQL Server 2019 big data clusters provide a complete AI platform. Data can be easily ingested via Spark Streaming or traditional SQL inserts and stored in HDFS, relational tables, graph, or JSON/XML. Data […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031