Jack Vanlightly creates a problem:
In this post we’re going to see how
share.acquire.mode=record_limitcombined with:
- fewer consumers than partitions
- and various cases of “partition skew”
…can result in subpar performance with share groups.
I stumbled on these issues when running large sets of dimensional tests with Dimster’s explore-limits mode, which finds the highest sustainable throughput while staying within a target end-to-end latency target. There was a specific subset of the tests that explore-limits mode would consistently fail to complete, and they all happened to be with record_limit and a consumer count lower than the partition count. In this test, we’ll understand why Dimster had such a hard time with this combination.
Click through for the details, as well as how to mitigate this sort of scenario.