Press "Enter" to skip to content

Stateful Stremaing with Spark

Sarfaraz Hussain continues a series on Spark Streaming:

Structured Streaming does processing under the hood as micro-batches (default nature).

state is versioned between micro-batches while the streaming query runs. So as the series of incremental execution plans are generated (discussed in Part 2), each execution plan knows what version of the state it needs to read from.

Each micro-batch reads the previous version of the state data i.e. the previous running count then updates it and creates a new version. Each of these versions gets check-pointed into the same check-point location that we have provided in the query.

Read on to understand the implications of this and what it allows you to do.