Subscription Versus Assignment In Kafka

Paolo Patierno explains why you shouldn’t mix subscribe() and assign() in Kafka:

Another great advantage of consumers grouping is the rebalancing feature. When a consumer joins a group, if there are still enough partitions available (i.e. we haven’t reached the limit of one consumer per partition), a re-balancing starts and the partitions will be reassigned to the current consumers, plus the new one. In the same way, if a consumer leaves a group, the partitions will be reassigned to the remaining consumers.

What I have told so far it’s really true using the subscribe() method provided by the KafkaConsumerAPI. This method forces you to assign the consumer to a consumer group, setting the group.id property, because it’s needed for re-balancing. In any case, it’s not the consumer’s choice to decide the partitions it wants to read for. In general, the first consumer joins the group doing the assignment while other consumers join the group.

Read on to learn more.

Related Posts

MERGE In Hive

Carter Shanklin introduces the MERGE operator in Hive: USE CASE 2: UPDATE HIVE PARTITIONS. A common strategy in Hive is to partition data by date. This simplifies data loads and improves performance. Regardless of your partitioning strategy you will occasionally have data in the wrong partition. For example, suppose customer data is supplied by a […]

Read More

Kafka Connect Done Easy

Robin Moffatt shows how to build a simple Kafka Connect flow: This is pretty cool – the update_ts column is managed automagically by MySQL (other RDBMS have similar functionality), and Kafka Connect’s JDBC connector is using this to pick out new and updated rows from the database. As a side note here, Kafka Connect tracks the offset of the […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31