Before the addition of Kafka Streams support, HDP and HDF supported two stream processing engines: Spark Structured Streaming and Streaming Analytics Manager (SAM) with Storm. So naturally, this begets the following question:
Why add a third stream processing engine to the platform?
With the choice of using Spark structured streaming or SAM with Storm support, customers had the choice to pick the right stream processing engine based on their non- functional requirements and use cases. However, neither of these engines addressed the following types of requirements that we saw from our customers:
And this doesn’t even include Samza or Flink, two other popular streaming engines.
My biased answer is, forget Storm. If you have a legacy implementation of it, that’s fine, but I wouldn’t recommend new streaming implementations based off of it. After that, you can compare the two competitors (as well as Samza and Flink) to see which fits your environment better. I don’t think either of these has many scenarios where you completely regret going with, say, Kafka Streams instead of Spark Streaming. Each has its advantages, but they’re not so radically different.