Talking To Secure Hadoop Clusters

Mubashir Kazia shows how to connect to a secured Hadoop cluster using Active Directory:

The primary form of strong authentication used on a secure cluster is Kerberos. Kerberos supports credentials delegation where a server process to which a user has authenticated, can perform actions on behalf of the user. This involves the server process accessing databases or other web services as the authenticated user. Historically the form of delegation that was supported by Kerberos is now called “full delegation”. In this type of delegation, the Ticket Granting Ticket (TGT) of the user is made available to the server process and server can then authenticate to any service where the user has been granted authorization. Until recently most Kerberos Key Distribution Center(KDC)s other than Active Directory supported only this form of delegation. Also Java until Java 7 supported only this form of delegation. Starting with Java 8, Java now supports Kerberos constrained delegation (S4U2Proxy), where if the KDC supports it, it is possible to specify which particular services the server process can be delegated access to.

Hadoop within its security framework has implemented impersonation or proxy support that is independent of Kerberos delegation. With Hadoop impersonation support you can assign certain accounts proxy privileges where the proxy accounts can access Hadoop resources or run jobs on behalf of other users. We can restrict proxy privileges granted to a proxy account to act on behalf of only certain users who are members of certain groups and/or only for connections originating from certain hosts. However we can’t restrict the proxy privileges to only certain services within the cluster.

What we are discussing in this article is how to setup Kerberos constrained delegation and access a secure cluster. The example here involves Apache Tomcat, however you can easily extend this to other Java Application Servers.

This is a good article showing specific details on using Kerberos in applications connecting to Hadoop.

Related Posts

Handling Errors in Kafka Connect

Robin Moffatt shows us some techniques for handling errors in your Kafka topics: We’ve seen how setting errors.tolerance = all will enable Kafka Connect to just ignore bad messages. When it does, by default it won’t log the fact that messages are being dropped. If you do set errors.tolerance = all, make sure you’ve carefully thought through […]

Read More

L-Diversity versus K-Anonymity

Duncan Greaves explains the concepts behind l-diversity: There are problems with K-anonymous datasets, namely the homogeneous pattern attack, and the background knowledge attack, details of which are in my original post. A slightly different approach to anonymising public datasets comes in the form of ℓ -diversity, a way of introducing further entropy/diversity into a dataset. […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031