PolyBase and Dockerized Hadoop

I have a solution to a problem which vexed me for quite some time:

Quite some time ago, I posted about PolyBase and the Hortonworks Data Platform 2.5 (and later) sandbox.

The summary of the problem is that data nodes in HDP 2.5 and later are on a Docker private network. For most cases, this works fine, but PolyBase expects publicly accessible data nodes by default—one of its performance enhancements with Hadoop was to have PolyBase scale-out group members interact directly with the Hadoop data nodes rather than having everything go through the NameNode and PolyBase control node.

Click through for the solution.

Related Posts

Spark Streaming DStreams

Manish Mishra explains the fundamental abstraction of Spark Streaming: Before going into details of the operations available on the DStream API, let us look at the input sources from which we can start a Stream. There are multiple ways in which we can get the inputs from e.g. Kafka, Flume, etc. Or simple Idle files. […]

Read More

Why Root Containers are Troublesome

Andrew Pruski explains to us why it can be bad to have a container user running as root: Recently I noticed that Microsoft uploaded a new dockerfile to the mssql-docker repository on Github. This dockerfile was under the mssql-server-linux-non-root directory and (you guessed it) allows SQL Server containers to run as non-root. But why is running a container as […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

September 2019
MTWTFSS
« Aug  
 1
2345678
9101112131415
16171819202122
23242526272829
30