Spark Versus Flink

Sibanjan Das compares Apache Flink to Apache Spark:

The primitive concept of Apache Flink is the high-throughput and low-latency stream processing framework which also supports batch processing. The architecture is a flip of the other Big Data processing architectures where the primary notion was the batch processing framework. This is something that organizations have been looking for over the last decade. There is a need for platforms supporting low latency data movement for applications where even a millisecond delay can lead to severe consequences. The prospect of Apache Flink seems to be significant and looks like the goal for stream processing.

While comparing these two, don’t forget about Kafka Streams.  We’ve entered the streaming era for Hadoop & friends, and it’s an exciting time.

Related Posts

Temporal Tables with Flink

Marta Paes shows off a new feature in Apache Flink: In the 1.7 release, Flink has introduced the concept of temporal tables into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and […]

Read More

Auto-Terminating Unused EMR Clusters

Praveen Krishamoorthy Ravikumar shows how you can use AWS Lambda to terminate ElasticMapReduce clusters which have been idle for a certain amount of time: To avoid this overhead, you must track the idleness of the EMR cluster and terminate it if it is running idle for long hours. There is the Amazon EMR native IsIdle Amazon […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031