Improving HBase Cluster Restart Time

Nitin Verma explains how to re-create an HBase cluster a bit faster:

When flush ‘table’ operation is triggered, all the regions belonging to that table will flush independently. Once the HFile corresponding to a region is flushed, it records the max sequence id in metadata and notifies the WAL corresponding to the regionserver. WAL maintains a mapping table for regions and their corresponding flushed sequence id’s. When the HBase cluster restarts, the hMaster will distribute flushed sequence id’s per region to the recovery threads splitting the WAL, so that they can skip the edits which have already been persisted in HFiles.

This is particularly important for clusters which frequently spin up and down, a feature of Platform-as-a-Service solutions like HDInsight.

Related Posts

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Creating SQL Server Images In Azure Container Registry

Andrew Pruski shows us how to save Docker container images to the Azure Container Registry using Powershell: Awesome! Our custom image is in our ACR! But has it worked? Has it really? Oh ye of little faith… I guess the only way to find out is to run a container! So let’s run a Azure […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930