Stream processing walkthrough
The entire pattern can be implemented in a few simple steps:
Set up Kafka on AWS.
Spin up an EMR 5.0 cluster with Hadoop, Hive, and Spark.
Create a Kafka topic.
Run the Spark Streaming app to process clickstream events.
Use the Kafka producer app to publish clickstream events into Kafka topic.
Explore clickstream events data with SparkSQL.
This is a pretty easy-to-follow walkthrough with some good tips at the end.