Share Groups and Sub-Optimal Performance

Jack Vanlightly creates a problem:

In this post we’re going to see how share.acquire.mode=record_limit combined with:

fewer consumers than partitions

and various cases of “partition skew”

…can result in subpar performance with share groups.

I stumbled on these issues when running large sets of dimensional tests with Dimster’s explore-limits mode, which finds the highest sustainable throughput while staying within a target end-to-end latency target. There was a specific subset of the tests that explore-limits mode would consistently fail to complete, and they all happened to be with record_limit and a consumer count lower than the partition count. In this test, we’ll understand why Dimster had such a hard time with this combination.

Click through for the details, as well as how to mitigate this sort of scenario.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31