Kafka Enrichment

I have an article on enriching data stored in a Kafka topic:

We’re going a bunch of setup work here, so let’s take it from the top.  First, I declare a consumer group, which I’m calling “Airplane Enricher.”  Kafka uses the concept of consumer groups to allow consumers to work in parallel.  Imagine that we have ten separate servers available to process messages from the Flights topic.  Each flight message is independent, so it doesn’t matter which consumer gets it.  What does matter, though, is that multiple consumers don’t get the same message, as that’s a waste of resources and could lead to duplicate data processing, which would be bad.

The way Kafka works around this is to use consumer groups:  within a consumer group, only one consumer will get a particular message.  That way, I can have my ten servers processing messages “for real” and maybe have another consumer in a different consumer group just reading through the messages getting a count of how many records are in the topic.  Once you treat topics as logs rather than queues, consumer design changes significantly.

This is a fairly lengthy read, but directly business-applicable, so I think it’s well worth it.

Related Posts

Enabling Exactly-Once Kafka Streams

Guozhang Wang wraps up his exactly-once series in Kafka: When restarting the application from the point of failure, we would then try to resume processing from the previously remembered position in the input Kafka topic, i.e. the committed offset. However, since the application was not able to commit the offset of the processed message A before crashing […]

Read More

Avro Schemas In Kafka

Stephane Maarek explains the value of using Apache Avro as a schema structure for your Kafka topics: Avro has support for primitive types ( int, string, long, bytes, etc…), complex types (enum, arrays, unions, optionals), logical types (dates, timestamp-millis, decimal), and data record (name and namespace). All the types you’ll ever need. Avro has support for embedded documentation. Although documentation is optional, in my workflow I […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31