Ajay Jagannathan shows how to integrate a SQL Server instance + Polybase with a Cloudera Hadoop cluster, all using Active Directory for accounts:
For all usernames and principals, we will use the suffixes like Cluster14 for name-scalability.
- Active Directory setup:
- Create a new Organizational Unit for Hadoop users in AD say (OU=Hadoop, OU=CORP, DC=CONTOSO, DC=COM).
- Create a hdfs superuser : hdfsCluster14@CORP.CONTOSO.COM
- Cloudera Manager requires an Account Manager user that has privileges to create other accounts in Active Directory. You can use the Active Directory Delegate Control wizard to grant this user permission to create other users by checking the option to “Create, delete and manage user accounts”. Create a user clouderaCluster14@CORP.CONTOSO.COM in OU=Hadoop, OU=CORP, DC=CONTOSO, DC=COM as an Account Manager.
Install OpenLDAP utilities (openldap-clients on RHEL/Centos) on the host of Cloudera Manager server. Install Kerberos client (krb5-workstation on RHEL/Centos) on all hosts of the cluster. This step requires internet connection in Hadoop server. If there is no internet connection in the server, you can download the rpm and install.
This is absolutely worth the read.