Polybase And HDInsight

I have a post up on trying to integrate Polybase with HDInsight:

But now we run into a problem:  there are certain ports which need to be open for Polybase to work.  This includes port 50010 on each of the data nodes against which we want to run MapReduce jobs.  This goes back to the issue we see with spinning up data nodes in Docker:  ports are not available.  If you’ve put your HDInsight cluster into an Azure VNet and monkey around with ports, you might be able to open all of the ports necessary to get this working, but that’s a lot more than I’d want to mess with, as somebody who hasn’t taken the time to learn much about cloud networking.

As I mention in the post, I’d much rather build my own Hadoop cluster; I don’t think you save much maintenance time in the long run going with HDInsight.

Related Posts

Hyperparameter Tuning with MLflow

Joseph Bradley shows how you can perform hyperparameter tuning of an MLlib model with MLflow: Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools CrossValidator and TrainValidationSplit.  These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning for more info. Databricks Runtime 5.3 and 5.3 ML and above support […]

Read More

TensorFrames: Spark Plus TensorFlow

Adi Polak gives us an introduction to TensorFrames: In all TensorFrames functionality, the DataFrame is sent together with the computations graph. The DataFrame represents the distributed data, meaning in every machine there is a chunk of the data that will go through the graph operations/ transformations. This will happen in every machine with the relevant […]

Read More


October 2017
« Sep Nov »