Prasad Alle shows how to integrate Kafka with Spark Streaming on AWS:
Stream processing walkthrough
The entire pattern can be implemented in a few simple steps:
-
Set up Kafka on AWS.
-
Spin up an EMR 5.0 cluster with Hadoop, Hive, and Spark.
-
Create a Kafka topic.
-
Run the Spark Streaming app to process clickstream events.
-
Use the Kafka producer app to publish clickstream events into Kafka topic.
-
Explore clickstream events data with SparkSQL.
This is a pretty easy-to-follow walkthrough with some good tips at the end.