Backing Up Azure Data Lake Store Data

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory:

Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop to ADLS.

Our ADLS is getting bigger and we’re working on a backup strategy for it. ADLS provides locally-redundant storage (LRS), however, this does not prevent our application from corrupting data or accidentally deleting it. Since Microsoft hasn’t published a new version of ADLS with a clone feature we had to find a way to backup all the data stored in our data lake.

We’re going to show you How to do a full ADLS backup with Azure Data Factory (ADF). ADF does not preserve permissions. However, our Hadoop client can only access the AzureDataLakeStoreFilesystem (adl) through hive with a “hive” user and we can generate these permissions before the backup.

Read the whole thing if you’re thinking of using Azure Data Lake Store.

Related Posts

Hortonworks Data Platform 3.0 Released

Saumitra Buragohain, et al, announce the newest version of the Hortonworks Data Platform: Highlighted Apache Hive features include: Workload management for LLAP:  You can assign resource pools within LLAP pool and allocate resources on a per user or per group basis. This enables support for large multi-tenant deployments. ACID v2 and ACID on by default:  We are […]

Read More

New Features In Public Preview On Azure SQL Database

Microsoft has a round of announcements for public previews on Azure SQL Database.  First up is Kevin Farlee announcing approximate count distinct: The new APPROX_COUNT_DISTINCT aggregate function returns the approximate number of unique non-null values in a group. This function is designed for use in big data scenarios and is optimized for the following conditions: Access of […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930