Flink: Streams Versus Batches

Kevin Jacobs has an article comparing Apache Flink to Spark Streaming:

The other type of data are data streams. Data streams can be visualized by water flowing from a tap to a sink. This process is not ending. The nice property of streams is that you can consume the stream while it is flowing. There is almost no latency involved for consuming a stream.

Apache Spark is fundamentally based on batches of data. By that, for all processing jobs at least some latency is introduced. Apache Flink on the other hand is fundamentally based on streams. Let’s take a look at some evidence for the difference in latency.

Read the whole thing.

Related Posts

Configuring Kafka Streams For Least Privilege

Gwen Shapira explains how we can assign minimal rights to Kafka Streams and KSQL users: The principle of least privilege dictates that each user and application will have the minimal privileges required to do their job. When applied to Apache Kafka® and its Streams API, it usually means that each team and application will have read and […]

Read More

Working With Key-Value Pairs In Spark

Teena Vashist shows us a few of the functions available with Spark for working with key-value pairs: 1. Creating Key/Value Pair RDD:  The pair RDD arranges the data of a row into two parts. The first part is the Key and the second part is the Value. In the below example, I used a parallelize method to create a RDD, […]

Read More


August 2016
« Jul Sep »