Giovanni Lanzani walks through some of the difficulties of getting LDAP working with Hadoop:
This section could probably could have much less workarounds if I’d knew more about LDAP.
But I’m a data scientist at heart and I want to get things done.
If you ever dealt with Hadoop, you know that there are a bunch of non-interactive users, i.e. users who are not supposed to login, such as
hdfs
,spark
,hadoop
, etc. These users are important to have. However the groups with the same name are also important to have. For example when usingairflow
and launching aspark
job, the log folders will be created under theairflow
user, in thespark
group.LDAP, however, doesn’t allow you, to my knowledge, to have overlapping user/groups, as Unix does.
The way I solved it was to create, in LDAP, the
spark_user
(orhdfs_user
or …) to work around this limitation.
Also, Giovanni apparently lives in an interesting neighborhood.