Scaling Kafka Streams

Kevin Feasel



Michael Noll discusses elastic scaling of Kafka Streams:

Third, how many instances can or should you run for your application?  Is there an upper limit for the number of instances and, similarly, for the parallelism of your application?  In a nutshell, the parallelism of a Kafka Streams application — similar to the parallelism of Kafka — is primarily determined by the number of partitions of the input topic(s) from which your application is reading. For example, if your application reads from a single topic that has 10 partitions, then you can run up to 10 instances of your applications (note that you can run further instances but these will be idle).  In summary, the number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and thus for the number of running instances of your application.  Note: A scaling/parallelism caveat here is that the balance of the processing work between application instances depends on how well data messages are balanced between partitions.

Check it out.  Kafka Streams is a potential alternative to Spark Streaming and Storm for real-time (for some definition of “real-time”) distributed computing.

Related Posts

MRAppMaster Errors Running MapReduce Jobs

I have a post looking at potential causes when PolyBase MapReduce jobs are unable to find the MRAppMaster class: Let me tell you about one of my least favorite things I like to see in PolyBase: Error: Could not find or load main class This error is not limited to PolyBase but is instead […]

Read More

Database-First or Kafka-First for Event Streaming

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense: Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more. If you were to go Kafka-first… well, […]

Read More


July 2016
« Jun Aug »