Impala Improvements in CDH 5.15.0

Kevin Feasel

2019-01-31

Hadoop

Michael Ho, et al, share some improvements in Apache Impala’s scalability in the Cloudera Distribution of Hadoop:

Kudu RPC (KRPC) supports asynchronous RPCs. This removes the need to have a single thread per connection. Connections between hosts are long-lived. All RPCs between two hosts multiplex on the same established connection. This drastically cuts down the number of TCP connections between hosts and decouples the number of connections from the number of query fragments.

The error handling semantics are much cleaner and the RPC library transparently re-establishes broken connections. Support for SASL and TLS are built-in. KRPC uses protocol buffers for payload serialization. In addition to structured data, KRPC also supports attaching binary data payloads to RPCs, which removes the cost of data serialization and is used for large data objects like Impala’s intermediate row batches. There is also support for RPC cancellation which comes in handy when a query is cancelled because it allows query teardown to happen sooner.

Looks like there were some pretty nice gains out of this project.

Related Posts

Tips For Using PolyBase With Cloudera QuickStart VM

I have a post on using Cloudera’s QuickStart VM with PolyBase: Here’s something which tripped me up a little bit while connecting to Cloudera using SQL Server. The data node name, instead of being quickstart.cloudera like the host name, is actually localhost. You can change this in /etc/cloudera-scm-agent/config.ini. Because PolyBase needs to have direct access to the data nodes, […]

Read More

Bayesian Modeling Of Hardware Failure Rates

Sean Owen shows how you can use Bayesian statistical approaches with Spark Streaming, using the example of hard drive failure rates: This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds […]

Read More

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031