Press "Enter" to skip to content

Generating Identity Integers in Spark

The Hadoop in Real World team hits a favorite topic of mine, monotonicity:

We would like to add an index column with a unique incrementing value like below.

There are few options to implement this use case in Spark. Let’s see them one by one.

These are all functions which you apply to existing data rather than generating new values like IDENTITY or Sequences in relational databases.