Press "Enter" to skip to content

Category: Kafka / Flink

Kafka Consumer Offset Changes with KIP-1094

Alieh Saeedi looks at a change in Apache Kafka 4.0.0:

Consumer offsets are at the heart of Apache Kafka®’s robust data handling capabilities, as they determine how data is consumed, reprocessed, or skipped across topics and partitions. In this comprehensive guide, we delve into the intricacies of Kafka offsets, covering everything from the necessity of manual offset control to the nuanced challenges posed by offset management in distributed environments. We further explore the solutions and enhancements introduced by KIP-1094 (available in Kafka 4.0.0), offering a closer look at how it addresses these challenges by enabling more accurate and reliable offset and leader epoch information retrieval.

Click through for an overview of how consumer behavior works, as well as what KIP-1094 does.

Comments closed

Troubleshooting an Apache Flink Job Not Producing Results

Wade Waldron digs in:

Imagine that you have built an Apache Flink® job. It collects records from Apache Kafka®, performs a time-based aggregation on those records, and emits a new record to a different topic. With your excitement high, you run the job for the first time, and are disappointed to discover that nothing happens. You check the input topic and see the data flowing, but when you look at the output topic, it’s empty.

In many cases, this is an indication that there is a problem with watermarks. But what is a watermark?

Read on for a primer on watermarks, followed by an explanation of the common solution to the problem Wade describes.

Comments closed

Apache Kafka 4.0 Now Available

David Jacot announces a milestone release for Apache Kafka:

Apache Kafka 4.0 is a significant milestone, marking the first major release to operate entirely without Apache ZooKeeper™️. By running in KRaft mode by default, Kafka simplifies deployment and management, eliminating the complexity of maintaining a separate ZooKeeper ensemble. This change significantly reduces operational overhead, enhances scalability, and streamlines administrative tasks. We want to take this as an opportunity to express our gratitude to the ZooKeeper community and say thank you! ZooKeeper was the backbone of Kafka for more than 10 years, and it did serve Kafka very well. Kafka would most likely not be what it is today without it. We don’t take this for granted, and highly appreciate all of the hard work the community invested to build ZooKeeper. Thank you!

There are some other big items in Kafka 4.0 and you can see more in the post’s changelog.

Comments closed

Deleting a Topic in Apache Kafka

Staff writers in the Confluent writing mines perform a deletion:

Can you delete a topic in Apache Kafka®The answer is yes—but the process depends on your Kafka configuration and the environment in which you are working (i.e., if it is self-managed, hosted in the cloud, or a fully managed Kafka service like Confluent Cloud).

Read on to see how you can do so, as well as some recommendations around deletion and topic management.

Comments closed

Tips for Scaling Apache Kafka

Narendra Lakshmana Gowda tunes a Kafka cluster:

Apache Kafka is known for its ability to process a huge quantity of events in real time. However, to handle millions of events, we need to follow certain best practices while implementing both Kafka producer services and consumer services.

Before start using Kafka in your projects, let’s understand when to use Kafka:

Much of the advice is pretty standard for performance tuning in Kafka, like setting batch size and linger time on the producer or managing consumers in a consumer group.

Comments closed

Handling Errors in Apache Flink Apps

Alexis Tekin and Jeremy Ber handle an error:

Data streaming applications continuously process incoming data, much like a never-ending query against a database. Unlike traditional database queries where you request data one time and receive a single response, streaming data applications constantly receive new data in real time. This introduces some complexity, particularly around error handling. This post discusses the strategies for handling errors in Apache Flink applications. However, the general principles discussed here apply to stream processing applications at large.

Read on to see how this all works when you’re hosting a Flink application. This directly relates to Flink applications that live in AWS, though very little in the article is AWS-specific.

Comments closed

Musings on the State of Apache Kafka and Apache Flink

Adron Hall shares some thoughts:

I’ve worked with (** references at end of article) a number of Apache projects over the years, often pretty closely; Apache Cassandra, Apache Flink, Apache Kafka, Apache Zookeeper and numerous others. But the last few years I’ve not been immediately hands on with the technology. A few questions popped up recently, that fortunately I was able to answer based on existing knowledge, but it made me real curious about what the SITREP (Situational Report) is for the Apache Kafka and Flink Projects for TODAY, i.e. rolling into 2025! The following is a quick dive into the history and then the latest details (and drama?) with Apache Kafka, Flink, and tangentially some other projects (Zookeeper?).

Click through to see how the pieces fit together.

Comments closed

Securing a Kafka Ecosystem

Riya has a breakdown of how to protect your Apache Kafka installation and resources around it:

Apache Kafka is the backbone of many real-time data pipelines, making security an essential aspect of its deployment. Protecting your Kafka ecosystem involves implementing encryption to safeguard data, authentication to verify user identities, and authorization to control access. This guide provides a comprehensive overview of these three pillars of securing Kafka, complete with code examples to help you implement best practices.

Click through for demonstrations of encryption, authentication, and authorization.

Comments closed

Handling a Consumer Fetch Request in Kafka

Multiple Confluent employees (who apparently don’t get to have names this time around) wrap up a series:

It’s been a long time coming, but we’ve finally arrived at the fourth and final installment of our blog series. In this series, we’ve been peeling back the layers of Apache Kafka® to get a deeper understanding of how best to interact with the cluster using producer and consumer clients.

Read on for the final part, as well as links to previous parts if you missed them.

Comments closed