Hive Going In-Memory

Kevin Feasel



Carter Shanklin and Nita Dembla discuss Hive memory-handling optimizations:

Let’s put this architecture to the test with a realistic dataset size and workload. Our previous performance blog, “Announcing Apache Hive 2.1: 25x Faster Queries and Much More”, discussed 4 reasons that LLAP delivers dramatically faster performance versus Hive on Tez. In that benchmark we saw 25+x performance boosts on ad-hoc queries with a dataset that fit entirely into the cluster’s memory.

In most cases, datasets will be far too large to fit in RAM so we need to understand if LLAP can truly tackle the big data challenge or if it’s limited to reporting roles on smaller datasets. To find out, we scaled the dataset up to 10 TB, 4x larger than aggregate cluster RAM, and we ran a number of far more complex queries.

Table 3 below shows how Hive LLAP is capable of running both At Speed and At Scale. The simplest query in the benchmark ran in 2.68 seconds on this 10 TB dataset while the most complex query, Query 64 performed a total of 37 joins and ran for more than 20 minutes.

Given how much faster memory is than disk, and given Spark’s broad adoption, this makes sense as a strategy for Hive’s continued value.

Related Posts

An Apache Sqoop Tutorial

Subham Sinha has an introductory-level tutorial on Apache Sqoop: For Hadoop developer, the actual game starts after the data is being loaded in HDFS. They play around this data in order to gain various insights hidden in the data stored in HDFS. So, for this analysis the data residing in the relational database management systems […]

Read More

Getting Started With Zeppelin

Sangeeta Gulia shows us how to get started building notebooks with Apache Zeppelin on top of Spark: There are 3 interpreter modes available in Zeppelin. 1) Shared Mode In Shared mode, a SparkContext and a Scala REPL is being shared among all interpreters in the group. So every Note will be sharing single SparkContext and single […]

Read More


October 2016
« Sep Nov »