HBase Updates in CDH 6.2

Kevin Feasel

2019-06-06

Hadoop

Krishna Maheshwari announces updates to the Cloudera Distribution of Hadoop:

Starting with CDH 6.2, Cloudera now includes the ability to use Intel’s newly released Optane Memory as an alternate destination for the 2nd tier of the bucket cache.  This deployment configuration enables you to have ~3x the size of the cache for constant cost (as compared to off-heap cache on DRAM). It does incur some additional latency compared to the traditional off-heap configuration, but our testing indicates that by allowing more (if not all) of the data’s working set to fit in the cache the set up results in a net performance improvement when the data is ultimately stored on HDFS (using HDDs).

When deploying to the cloud or using on-prem object storage, the performance improvement will be even better as object storage tends to be very expensive for random reads of small amounts of data.

There aren’t too many changes to HBase in the blog post, but the two mentioned are pretty good ones.

Related Posts

Calculating YARN Utilization Metrics

Dmitry Tolpeko shows how you can calculate per-second cluster utilization measures from YARN’s resource manager logs: But even if you query YARN REST API every second it still can only provide a snapshot of the used YARN resources. It does not show which application allocates or releases containers, their memory and CPU capacity, in which […]

Read More

Spark Streaming DStreams

Manish Mishra explains the fundamental abstraction of Spark Streaming: Before going into details of the operations available on the DStream API, let us look at the input sources from which we can start a Stream. There are multiple ways in which we can get the inputs from e.g. Kafka, Flume, etc. Or simple Idle files. […]

Read More

Categories

June 2019
MTWTFSS
« May Jul »
 12
3456789
10111213141516
17181920212223
24252627282930