The Optimal Kafka Message Size

Guy Shilo wants to figure out the right chunk size for a Kafka message:

I wrote a python program that runs a producer and a consumer for 30 minutes with different message sizes and measures how many messages per second it can deliver, or the Kafka cluster throughput.

I did not care about the message content, so the consumer only reads the messages from the topic and then discards them. I used a Three partition topic. I guess that on larger clusters with more partitions the performance will be better, but the message size – throughput ratio will remain roughly the same.

So I wrote a small python program that generates a dummy message in the desired size, then spawns two threads, one is a producer and the other is a consumer. The producer send the same message over and over and the consumer reads the messages from the topic and count how many messages it has read. The main program stops after 30 minutes but before it stops it prints how many messages were consumed and how many messages were consumed per second.

Read on for the results.  More importantly, test in your own environment with your own equipment, as that value’s likely to differ a bit.

Related Posts

Load Testing Tools For SQL Server

Brent Ozar shares a list of load testing tools for SQL Server: One thing I need you to understand first: you have to provide the database and the queries. Almost all of the tools in this post, except the last one, are designed to help you run queries, but they don’t include the queries. The […]

Read More

Troubleshooting Spark Performance

Bikas Saha and Mridul Murlidharan explain some of the basics of performance tuning with Apache Spark: Our objective was to build a system that would provide an intuitive insight into Spark jobs that not just provides visibility but also codifies the best practices and deep experience we have gained after years of debugging and optimizing […]

Read More


March 2018
« Feb Apr »