So You *Really* Want To Monitor Kafka…

Yeva Byzek walks through Confluent Platform:

Kafka exposes hundreds of metrics. Some of them are per broker, per client, per topic, and per partition, and so the number of metrics scales up as the cluster grows. For an average-size Kafka cluster, the number of metrics very quickly bloats to the thousands.

Warning: I am about to disappoint you. You probably recognize that you realistically cannot monitor every single available metric. So you are probably hoping that in this blog post I will filter down the list of metrics to a dozen of the most critical ones, which you would then push through some generic monitoring tool, and then be done with setting up “monitoring.” However, monitoring distributed systems like Kafka is not that simple, and so there is no such list. Keep reading to understand the problems you should be solving, and how to solve them in a robust monitoring solution specifically designed for Kafka.

A common pitfall of generic monitoring tools is that they import all available metrics from a variety of systems into a metrics swamp. Even with a comprehensive list of metrics, there is a limit to what can be achieved with no Kafka context nor Kafka expertise to determine which metrics are important and which ones are not. A metrics swamp cannot produce valuable insight from the data nor provide answers to the critical business questions we asked earlier.

This is an information-dense post that you’ll want to read if you work with Apache Kafka.

Related Posts

Handling Errors in Kafka Connect

Robin Moffatt shows us some techniques for handling errors in your Kafka topics: We’ve seen how setting errors.tolerance = all will enable Kafka Connect to just ignore bad messages. When it does, by default it won’t log the fact that messages are being dropped. If you do set errors.tolerance = all, make sure you’ve carefully thought through […]

Read More

Batch Consumption from Kafka with Spark

Swapnil Chougule shares a few tips on performing batch processing of a Kafka topic using Apache Spark: Spark as a compute engine is very widely accepted by most industries. Most of the old data platforms based on MapReduce jobs have been migrated to Spark-based jobs, and some are in the phase of migration. In short, […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031