Bulk Loading HDInsight Using Phoenix

Anunay Tiwari uses Phoenix to bulk load data into HBase on HDInsight:

Apache HBase is an open Source No SQL Hadoop database, a distributed, scalable, big data store. It provides real-time read/write access to large datasets. HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. HBase provides many features as a big data store. But in order to use HBase, the customers have to first load their data into HBase.

There are multiple ways to get data into HBase such as – using client API’s, Map Reduce job with TableOutputFormat or inputting the data manually on HBase shell. Many customers are interested in using Apache Phoenix – a SQL layer over HBase for its ease of use. The current post describes about how to use phoenix bulk load with HDinsight clusters.

Phoenix provides two methods for loading CSV data into Phoenix tables – a single-threaded client loading tool via the psql command, and a MapReduce-based bulk load tool.

Anunay explains both methods, allowing you to choose based on your data needs.

Related Posts

Alerting In Azure SQL Database

Arun Sirpal shows how to set up an alert for an Azure SQL Database: I keep things simple and like to look at certain performance based metrics but before talking about what metrics are available let’s step through an example. For this post I want to setup an alert for CPU percentage utilised that when […]

Read More

Connect(); Announcements, Including Azure Databricks

James Serra has a wrapup of Microsoft Connect(); announcements around the data platform space: Microsoft Connect(); is a developer event from Nov 15-17, where plenty of announcements are made.  Here is a summary of the data platform related announcements: Azure Databricks: In preview, this is a fast, easy, and collaborative Apache Spark based analytics platform optimized for Azure. […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728