Using The Spark-HBase Connector

Anunay Tiwari shows how to use the Spark-HBase connector in HDInsight:

The Spark-Hbase Connector provides an easy way to store and access data from HBase clusters with Spark jobs. HBase is really successful for highest level of data scale needs. Thus, existing Spark customers should definitely explore this storage option. Similarly, if the customers are already having HDinsight HBase clusters and they want to access their data by Spark jobs then there is no need to move data to any other storage medium. In both the cases, the connector will be extremely useful.

I’m not the biggest fan of HBase, but if it’s part of your environment, you should definitely look at this Spark connector.

Related Posts

Async Processing With Azure Analysis Services

Teo Lachev notes that you can process Azure Analysis Services cubes without maintaining an HTTP connection: AAS supports processing tasks asynchronously with REST APIs. The difference is that the service component (REST API) maintains the connectivity to the server – thus reducing the chances of HTTP disconnections from the external application. Microsoft has provided a RestAPISample console app to help you get […]

Read More

Azure Data Factory v2 And Decompression

Ben Jarvis notes a file naming bug with Azure Data Factory v2 when decompressing files: ADF V2 natively supports decompression of files as documented at https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#compression-support. With this functionality ADF should change the extension of the file when it is decompressed so 1234_567.csv.gz would become 1234_567.csv however, I’ve noticed that this doesn’t happen in all cases. […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031