So You Want To Monitor Kafka…

Ben Summer reviews a Kafka catstrophe and explains how to avoid it:

Here at New Relic, the Edge team is responsible for the pipelines that handle all the data coming into our company. We were an early adopter of Apache Kafka, which we began using to power this data pipeline. Our initial results were outstanding. Our cluster handled any amount of data we threw at it; it showed incredible fault tolerance and scaled horizontally. Our implementation was so stable for so long that we basically forgot about it. Which is to say, we totally neglected it. And then one day we experienced a catastrophic incident.

Our main cluster seized up. All graphs, charts, and dashboards went blank. Suddenly we were totally in the dark — and so were our customers. The incident lasted almost four hours, and in the end, an unsatisfactory number of customers experienced some kind of data loss. It was an epic disaster. Our Kafka infrastructure had been running like a champ for more than a year and suddenly it had ground to a halt.

This happened several years ago, but to this day we still refer to the incident as the “Kafkapocalypse.”

Ben has a couple interesting stories and some good rules of thumb for maintaining a Kafka cluster.

Related Posts

Flink and Stateful Streaming

Himanshu Gupta explains some of the benefits Apache Flink offers for stateful streaming applicatons: The 2 main types of stream processing done are:1. Stateless: Where every event is handled completely independent from the preceding events.2. Stateful: Where a “state” is shared between events and therefore past events can influence the way current events are processed. […]

Read More

Performance Testing Aiven Kafka

Heikki Nousiainen tests the Aiven platform’s Kafka implementation on different cloud providers at different service levels: We used a single topic for our write operations with a partition count set to either 3 or 6, depending on the number of brokers in each test cluster. As the test clusters were regular Aiven services, the partitions […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031