Spark as a compute engine is very widely accepted by most industries. Most of the old data platforms based on MapReduce jobs have been migrated to Spark-based jobs, and some are in the phase of migration. In short, batch computation is being done using Spark. As a result, organizations’ infrastructure and expertise have been developed around Spark.
So, the now question is: can Spark solve the problem of batch consumption of data inherited from Kafka? The answer is yes.
The advantages of doing this are: having a unified batch computation platform, reusing existing infrastructure, expertise, monitoring, and alerting.
Click through to get to the starting point on this as well as a few tips to avoid stumbling blocks.