Improving HBase Cluster Restart Time

Nitin Verma explains how to re-create an HBase cluster a bit faster:

When flush ‘table’ operation is triggered, all the regions belonging to that table will flush independently. Once the HFile corresponding to a region is flushed, it records the max sequence id in metadata and notifies the WAL corresponding to the regionserver. WAL maintains a mapping table for regions and their corresponding flushed sequence id’s. When the HBase cluster restarts, the hMaster will distribute flushed sequence id’s per region to the recovery threads splitting the WAL, so that they can skip the edits which have already been persisted in HFiles.

This is particularly important for clusters which frequently spin up and down, a feature of Platform-as-a-Service solutions like HDInsight.

Related Posts

Hooking SQL Server to Kafka

Niels Berglund has an interesting scenario for us: We see how the procedure in Code Snippet 2 takes relevant gameplay details and inserts them into the dbo.tb_GamePlay table. In our scenario, we want to stream the individual gameplay events, but we cannot alter the services which generate the gameplay. We instead decide to generate the event from the database […]

Read More

Oracle Data Guard on Azure

Kellyn Pot’vin-Gorman’s worlds continue to collide: So, as most people know, I’m not a big fan of Oracle RAC, (Real Application Cluster).  My opinion was that it was often sold for use cases that it doesn’t serve, (such as HA) and the resource demands between the nodes, as well as what happens when a node […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930