Apache Kafka is the de facto standard for event streaming today. Part of what makes Kafka so successful is its ability to handle tremendous volumes of data, with a throughput of millions of records per second, not unheard of in production environments. One part of Kafka’s design that makes this possible is partitioning.
Kafka uses partitions to spread the load of data across brokers in a cluster, and it’s also the unit of parallelism; more partitions mean higher throughput. Since Kafka works with key-value pairs, getting records with the same key on the same partition is essential.
Read on to learn a bit about how that partitioning works and why it’s important for application design, especially across multiple programming languages.