In previous articles, I’ve walk through using Sqoop to import data to HDFS. I’ve also detailed how to perform full and incremental imports to Hive external and Hive managed tables.
In this article I’m going to show you how to automate execution of Sqoop jobs via Cron.
However, before we get to scheduling we need to address security. In prior examples I’ve used -P to prompt the user for login credentials interactively. With a scheduled job, this isn’t going to work. Fortunately Sqoop provides us with the “password-alias” arg which allows us to pass in passwords stored in a protected keystore.
That particular keystore tie-in works quite smoothly in my experience.