In my last two blog posts I walked through how to use Sqoop to perform full imports. Nightly full imports with overwrite has it’s place for small tables like dimension tables. However, in real-world scenarios you’re also going to want a way to import only the delta values since the last time an import was run. Sqoop offers two ways to perform incremental imports: append and lastmodified.
Both incremental imports can be run manually or created as job using the “sqoop job” command. When running incremental imports manually from the command line the “–last-value” arg is used to specify the reference value for the check-column. Alternately sqoop jobs track the “check-column” in the job and the value of the check-column is used for subsequent job runs as the where predicate in the SQL statement. I.E. select columns from table where check-column > (last-max-check-column-value).
This is where Sqoop starts to break down for me, and Jon lists some of the issues in the post.