Kafka / Flink – Page 2

Kafka Connector for Cosmos DB

Published 2025-05-22 by Kevin Feasel

Sudhindra Sheshadrivasan announces a new connector has become generally available:

We’re excited to announce the General Availability (GA) of the Confluent fully managed V2 connector for Apache Kafka® for Azure Cosmos DB! This release marks a major milestone in our mission to simplify real-time data streaming from and to Azure Cosmos DB using Apache Kafka®.

The V2 connector is now production-ready and available directly from the Confluent Cloud connector catalog. This managed connector allows you to seamlessly integrate Azure Cosmos DB with your Kafka-powered event streaming architecture—without worrying about provisioning, scaling, or managing the connector infrastructure.

Read on to learn more about the new connector and what it takes to hook everything up.

Comments closed

Building a Multi-Agent Orchestrator with Flink and Kafka

Published 2025-05-05 by Kevin Feasel

Sean Falconer builds an orchestration engine:

Just as some problems are too big for one person to solve, some tasks are too complex for a single artificial intelligence (AI) agent to handle. Instead, the best approach is to decompose problems into smaller, specialized units so that multiple agents can work together as a team.

This is the foundation of a multi-agent system—networks of agents, each with a specific role, collaborating to solve larger problems.

Read on for the overview. There’s also a code repository and a free e-book on the topic.

Comments closed

Kafka Data Exploration with Tableflow

Published 2025-04-29 by Kevin Feasel

Robin Moffatt does some exploratory data analysis:

One of the challenges that I’d always had when it came to building streaming data pipelines is that once data is in a Kafka topic, it becomes trickier to query. Whether limited by the available tools to do this or the speed of access, querying Kafka is just not a smooth experience.

This blog post will show you a really nice way of exploring and validating data in Apache Kafka®. We’ll use Tableflow to expose the Kafka topics as Apache Iceberg™️ tables and then query them using standard SQL tools.

Click through for the demonstration using a real dataset.

Comments closed

Diskless Topics in Apache Kafka

Published 2025-04-25 by Kevin Feasel

Filip Yonov and Josep Prat work through a challenge:

KIP-1150 isn’t a distant, strange planet; it just reroutes Kafka’s entire replication pathway from broker disks to cloud object storage. Flip one topic flag and your data bypasses local drives altogether:

No disks to babysit: Hot-partition drama, IOPS ceilings, and multi-hour rebalances vanish—freeing up time (for more blog posts).

Cloud bill trimmed by up to 80%: Object storage replaces triple-replicated setups with pricey SSDs and every byte of cross‑zone replication, erasing the “cloud tax”.

Scale in real time: With nothing pinned to brokers, you can spin brokers up (or down) in seconds to absorb traffic spikes.

Because Diskless is built into Kafka (no client changes, no forks), we had to solve a 4D puzzle: How do you make a Diskless topic behave exactly like a Kafka one—inside the same cluster—without rewriting Kafka? This blog unpacks the first‑principles, deep dive into the thought process,and trade‑offs that shaped the proposal.

Click through for a deep dive on this from the perspective of a platform host.

Comments closed

Kafka Consumer Offset Changes with KIP-1094

Published 2025-04-23 by Kevin Feasel

Alieh Saeedi looks at a change in Apache Kafka 4.0.0:

Consumer offsets are at the heart of Apache Kafka®’s robust data handling capabilities, as they determine how data is consumed, reprocessed, or skipped across topics and partitions. In this comprehensive guide, we delve into the intricacies of Kafka offsets, covering everything from the necessity of manual offset control to the nuanced challenges posed by offset management in distributed environments. We further explore the solutions and enhancements introduced by KIP-1094 (available in Kafka 4.0.0), offering a closer look at how it addresses these challenges by enabling more accurate and reliable offset and leader epoch information retrieval.

Click through for an overview of how consumer behavior works, as well as what KIP-1094 does.

Comments closed

Troubleshooting an Apache Flink Job Not Producing Results

Published 2025-04-18 by Kevin Feasel

Wade Waldron digs in:

Imagine that you have built an Apache Flink® job. It collects records from Apache Kafka®, performs a time-based aggregation on those records, and emits a new record to a different topic. With your excitement high, you run the job for the first time, and are disappointed to discover that nothing happens. You check the input topic and see the data flowing, but when you look at the output topic, it’s empty.

In many cases, this is an indication that there is a problem with watermarks. But what is a watermark?

Read on for a primer on watermarks, followed by an explanation of the common solution to the problem Wade describes.

Comments closed

Kafka Deployment in a KRaft World

Published 2025-03-26 by Kevin Feasel

Sven Loesekann deploys Apache Kafka 4.0:

With KRaft for Kafka ZooKeeper is no longer needed. KRaft is a protocol to select a leader among several server instances. That makes the Kafka setup much easier.

The new configuration is shown on the example of the MovieManager project.

Click through to see how you can install and configure Kafka without taking a dependency on ZooKeeper.

Comments closed

Apache Kafka 4.0 Now Available

Published 2025-03-25 by Kevin Feasel

David Jacot announces a milestone release for Apache Kafka:

Apache Kafka 4.0 is a significant milestone, marking the first major release to operate entirely without Apache ZooKeeper™️. By running in KRaft mode by default, Kafka simplifies deployment and management, eliminating the complexity of maintaining a separate ZooKeeper ensemble. This change significantly reduces operational overhead, enhances scalability, and streamlines administrative tasks. We want to take this as an opportunity to express our gratitude to the ZooKeeper community and say thank you! ZooKeeper was the backbone of Kafka for more than 10 years, and it did serve Kafka very well. Kafka would most likely not be what it is today without it. We don’t take this for granted, and highly appreciate all of the hard work the community invested to build ZooKeeper. Thank you!

There are some other big items in Kafka 4.0 and you can see more in the post’s changelog.

Comments closed

Deleting a Topic in Apache Kafka

Published 2025-03-14 by Kevin Feasel

Staff writers in the Confluent writing mines perform a deletion:

Can you delete a topic in Apache Kafka®? The answer is yes—but the process depends on your Kafka configuration and the environment in which you are working (i.e., if it is self-managed, hosted in the cloud, or a fully managed Kafka service like Confluent Cloud).

Read on to see how you can do so, as well as some recommendations around deletion and topic management.

Comments closed

Tips for Scaling Apache Kafka

Published 2025-02-11 by Kevin Feasel

Narendra Lakshmana Gowda tunes a Kafka cluster:

Apache Kafka is known for its ability to process a huge quantity of events in real time. However, to handle millions of events, we need to follow certain best practices while implementing both Kafka producer services and consumer services.

Before start using Kafka in your projects, let’s understand when to use Kafka:

Much of the advice is pretty standard for performance tuning in Kafka, like setting batch size and linger time on the producer or managing consumers in a consumer group.

Comments closed

Category: Kafka / Flink