Tracking Kafka Consumer Lag

Simarpreet Kaur Monga has a Scala-based example showing how to calculate Kafka offset lag for consumers:

The Consumer can subscribe to multiple topics, you need to pass the list of topics you want to consume from. For the sake of simplicity, I have just passed a single topic to consume from.

Now that the consumer has subscribed to the topic, it can consume from that topic.

The consumer maintains an offset to keep the track of the next record it needs to read.

Now, let us see how we can find the consumer offsets.

The Consumer offsets can be found using the method offset of class ConsumerRecord. This offset points to the record in a Kafka partition. The consumer consumes the records from the topic in the form of an object of class ConsumerRecord. The class ConsumerRecord also consists of a topic name and a partition number from which the record is being received, and a timestamp as marked by the corresponding ProducerRecord (the record sent by the producer).

Click through for the rest of the story.

Related Posts

Running Hive LLAP As A YARN Service

Gour Saha, et al, demonstrate running Apache Hive LLAP as a YARN service: Making LLAP as a first-class YARN service also enables us to use some of the other powerful features in YARN that were added in Apache Hadoop 3.0 / 3.1, some of them are noted below. Advanced container placement scheduling such as affinity […]

Read More

Flattening JSON Data With Databricks

Ivan Vazharov gives us a Databricks notebook to parse and flatten JSON using PySpark: With Databricks you get: An easy way to infer the JSON schema and avoid creating it manually Subtle changes in the JSON schema won’t break things The ability to explode nested lists into rows in a very easy way (see the […]

Read More

Categories

November 2017
MTWTFSS
« Oct Dec »
 12345
6789101112
13141516171819
20212223242526
27282930