Guy Shilo wants to figure out the right chunk size for a Kafka message:
I wrote a python program that runs a producer and a consumer for 30 minutes with different message sizes and measures how many messages per second it can deliver, or the Kafka cluster throughput.
I did not care about the message content, so the consumer only reads the messages from the topic and then discards them. I used a Three partition topic. I guess that on larger clusters with more partitions the performance will be better, but the message size – throughput ratio will remain roughly the same.
So I wrote a small python program that generates a dummy message in the desired size, then spawns two threads, one is a producer and the other is a consumer. The producer send the same message over and over and the consumer reads the messages from the topic and count how many messages it has read. The main program stops after 30 minutes but before it stops it prints how many messages were consumed and how many messages were consumed per second.
Read on for the results. More importantly, test in your own environment with your own equipment, as that value’s likely to differ a bit.