Flink And Kafka Streams

Neha Narkhede and Stephan Ewen compare Apache Flink versus Kafka Streams:

Before Flink, users of stream processing frameworks had to make hard choices and trade off either latency, throughput, or result accuracy. Flink was the first open source framework (and still the only one), that has been demonstrated to deliver (1) throughput in the order oftens of millions of events per second in moderate clusters, (2) sub-second latency that can be as low as few 10s of milliseconds, (3) guaranteed exactly once semantics for application state, as well as exactly once end-to-end delivery with supported sources and sinks (e.g., pipelines from Kafka to Flink to HDFS or Cassandra), and (4) accurate results in the presence of out of order data arrival through its support for event time. Flink is based on a cluster architecture with master and worker nodes. Flink clusters are highly available, and can be deployed standalone or with resource managers such as YARN and Mesos. This architecture is what allows Flink to use a lightweight checkpointing mechanism to guarantee exactly-once results in the case of failures, as well allow easy and correct re-processing via savepoints without sacrificing latency or throughput. Finally, Flink is also a full-fledged batch processing framework, and, in addition to its DataStream and DataSet APIs (for stream and batch processing respectively), offers a variety of higher-level APIs and libraries, such as CEP (for Complex Event Processing), SQL and Table (for structured streams and tables), FlinkML (for Machine Learning), and Gelly (for graph processing). Flink has been proven to run very robustly in production at very large scale by several companies, powering applications that are used every day by end customers.

The upshot is that the two products don’t do exactly the same thing, and there might be room in your organization for the two of them.

Related Posts

Testing an Event-Driven System

Andy Chambers takes us through how to test an event-driven system: Each distinct service has a nice, pure data model with extensive unit tests, but now with new clients (and consequently new requirements) coming thick and fast, the number of these services is rapidly increasing. The testing guardian angel who sometimes visits your thoughts during […]

Read More

Processing Fixed-Width Files with Spark

Subhasish Guha shows how you can read a fixed-with file with Apache Spark: A fixed width file is a very common flat file format when working with SAP, Mainframe, and Web Logs. Converting the data into a dataframe using metadata is always a challenge for Spark Developers. This particular article talks about all kinds of […]

Read More


September 2016
« Aug Oct »