Polybase And HDInsight

I have a post up on trying to integrate Polybase with HDInsight:

But now we run into a problem:  there are certain ports which need to be open for Polybase to work.  This includes port 50010 on each of the data nodes against which we want to run MapReduce jobs.  This goes back to the issue we see with spinning up data nodes in Docker:  ports are not available.  If you’ve put your HDInsight cluster into an Azure VNet and monkey around with ports, you might be able to open all of the ports necessary to get this working, but that’s a lot more than I’d want to mess with, as somebody who hasn’t taken the time to learn much about cloud networking.

As I mention in the post, I’d much rather build my own Hadoop cluster; I don’t think you save much maintenance time in the long run going with HDInsight.

Related Posts

Multi-Region Replication with Confluent Platform

David Arthur walks us through multi-region replication of Kafka clusters in the Confluent Platform 5.4 preview: Running a single Apache Kafka® cluster across multiple datacenters (DCs) is a common, yet somewhat taboo architecture. This architecture, referred to as a stretch cluster, provides several operational benefits and unlocks the door to many uses cases. Stretch clusters provide […]

Read More

Diagnosing TCP SACKs-Related Slowdown in Databricks

Chris Stevens, et al, walk us through troubleshooting a slowdown after using Linux images which have been patched for the TCP SACKs vulnerabilities: In order to figure out why the straggler task took 15 minutes, we needed to catch it in the act. We reran the benchmark while monitoring the Spark UI, knowing that all […]

Read More

Categories

October 2017
MTWTFSS
« Sep Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031