Tim Spann explains how to perform backup/recovery operations and disaster recovery using Hadoop:
You can mirror datasets with Falcon. Mirroring is a very useful option for enterprises and is well-documented. This is something that you may want to get validated by a third party. See the following resources:
- Hive DR with Falcon.
- Data movement and integration (this overview from Hortonworks is very useful for practical data movement between and within the cluster).
- Falcon details (in-depth presentation).
Tim shows several recovery options, making it useful reading if you use Hadoop as a source system for anything (or if you can’t afford it to be down for a 2-3 day period as you recover data).