Imagine you have thousands or millions of devices sending data to your data lake. This data must be managed with speed, security, and reliability. In addition, for legal reasons you must partition data by country, device, and city. These requirements seem reasonable, and in 2019, stream-processing platforms must be able to deal with them.
But how well do they? Kafka is not known to work well when there are thousands of topics and partitions even if the data is not massive. You can see how complicated it can be to try to solve performance challenges in these scenarios.
I like this sort of competition, as I know Kafka will step up their game as a result.