Impala Improvements in CDH 5.15.0

Kevin Feasel

2019-01-31

Hadoop

Michael Ho, et al, share some improvements in Apache Impala’s scalability in the Cloudera Distribution of Hadoop:

Kudu RPC (KRPC) supports asynchronous RPCs. This removes the need to have a single thread per connection. Connections between hosts are long-lived. All RPCs between two hosts multiplex on the same established connection. This drastically cuts down the number of TCP connections between hosts and decouples the number of connections from the number of query fragments.

The error handling semantics are much cleaner and the RPC library transparently re-establishes broken connections. Support for SASL and TLS are built-in. KRPC uses protocol buffers for payload serialization. In addition to structured data, KRPC also supports attaching binary data payloads to RPCs, which removes the cost of data serialization and is used for large data objects like Impala’s intermediate row batches. There is also support for RPC cancellation which comes in handy when a query is cancelled because it allows query teardown to happen sooner.

Looks like there were some pretty nice gains out of this project.

Related Posts

Flink: Batch as a Special Case of Streaming

Fabian Hueske and Aljoscha Krettek describe streaming versus batch processing in Apache Flink: The Apache Flink project has followed the philosophy of taking a unified approach to batch and stream data processing, building on the core paradigm of “continuous processing of unbounded data streams” for a long time. If you think about it, carrying out […]

Read More

Multi-Tenant Security in Kudu + Impala

Grant Henke shows how you can combine Apache Impala’s fine-grained authorization with Apache Kudu’s coarse-grained authentication for multi-tenant scenarios: Kudu supports coarse-grained authorization of client requests based on the authenticated client Kerberos principal. The two levels of access which can be configured are:1. Superuser – principals authorized as a superuser are able to perform certain administrative […]

Read More

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031