Jean Georges Perrin introduces checkpoints on Spark data frames:
Basically, I use a checkpoint if I want to freeze the content of my data frame before I do something else. It can be in the scenario of iterative algorithms (as mentioned in the Javadoc) but also in recursive algorithms or simply branching out a data frame to run different kinds of analytics on both.
Spark has been offering checkpoints on streaming since earlier versions (at least v1.2.0), but checkpoints on data frames are a different beast.
This could also be very useful for a quality control flow: perform operation A, and if it doesn’t generate good enough results, roll back and try operation B.