HBase Incremental Backup And Restore

Carter Shanklin and Vladimir Rodionov discuss incremental backup and restore coming to HBase & Phoenix:

If your tables are large it may not be possible to restore them under a different name due to space constraints. The really powerful thing about HBase backups is they are stored in WAL files that can be parsed using a simple interface that can be consumed either in Java or using the “hbase wal” utility.

Consider this scenario: A customer rep deleted some data because he thought it was unimportant. A week later the customer is upset because the data was important and you need to restore these few pieces of information. With HBase backups all you need to do is parse through the backups with a WAL reader and extract the historical values, which you can then add back in. With other databases you would have to bring another database instance online and load the backups into it. Having backups in open, well-understood formats unlocks many powerful opportunities and can bring recovery times down from days to minutes.

Read on if you manage a Hadoop cluster with HBase (or you’re likely to administer one soon).

Related Posts

How Spark Works: RDDs And DAGs

Shubham Agarwal gets into the way that Spark translates operations on Resilient Distributed Datasets into actions: When we do a transformation on any RDD, it gives us a new RDD. But it does not start the execution of those transformations. The execution is performed only when an action is performed on the new RDD and […]

Read More

Five Books For Learning Kafka

Data Flair has a guide to five books to help you learn Apache Kafka: The book “Kafka: The Definitive Guide” is written by engineers from Confluent andLinkedIn who are responsible for developing Kafka. They explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. It contains detailed examples as well. […]

Read More


July 2016
« Jun Aug »