Using Hive LLAP On ElasticMapReduce

Jigar Mistry shows how to configure and use Hive LLAP on AWS’s ElasticMapReduce:

With many options available in the market (Presto, Spark SQL, etc.) for doing interactive SQL  over data that is stored in Amazon S3 and HDFS, there are several reasons why using Hive and LLAP might be a good choice:

  • For those who are heavily invested in the Hive ecosystem and have external BI tools that connect to Hive over JDBC/ODBC connections, LLAP plugs in to their existing architecture without a steep learning curve.

  • It’s compatible with existing Hive SQL and other Hive tools, like HiveServer2, and JDBC drivers for Hive.

  • It has native support for security features with authentication and authorization (SQL standards-based authorization) using HiveServer2.

  • LLAP daemons are aware about of the columns and records that are being processed which enables you to enforce fine-grained access control.

  • It can use Hive’s vectorization capabilities to speed up queries, and Hive has better support for Parquet file format when vectorization is enabled.

  • It can take advantage of a number of Hive optimizations like merging multiple small files for query results, automatically determining the number of reducers for joins and groupbys, etc.

  • It’s optional and modular so it can be turned on or off depending on the compute and resource requirements of the cluster. This lets you to run other YARN applications concurrently without reserving a cluster specifically for LLAP.

Read on for more details, including the bootstrap action you need to take and how to use LLAP once you have it configured.

Related Posts

Working With Images In Spark 2.4

Tomas Nykodym and Weichen Xu give us an update on working with images in the most recent version of Apache Spark: An image data source addresses many of these problems by providing the standard representation you can code against and abstracts from the details of a particular image representation.Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post […]

Read More

Comparing Streaming Engines

George Vetticaden compares Spark Streaming, Storm, and Kafka Streams: Before the addition of Kafka Streams support, HDP and HDF supported two stream processing engines:  Spark Structured Streaming and Streaming Analytics Manager (SAM) with Storm. So naturally, this begets the following question:Why add a third stream processing engine to the platform?With the choice of using Spark […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031