Kafka / Flink – Curated SQL

Diskless Kafka in Practice

Published 2025-06-19 by Kevin Feasel

I joined Aiven as a Developer Advocate in May, shortly after the Kafka Improvement Proposal KIP-1150: Diskless Topics was announced, which is reduces the total cost of ownership of Kafka by up to 80%! It was very exciting to join Aiven just as the streaming team were making this major contribution to open source but I wanted to take my time to understand the KIP before sharing my thoughts.

In this article I’ll share my first impressions of Diskless Kafka, walk you through a simple example you can use to experiment with Diskless, and highlight some of the great resources that are out there for learning about the topic. First though, what actually is Diskless Kafka?

Click through for that answer, as well as more.

Comments closed

Consumer Group Rebalancing in Kafka and KIP-848

Published 2025-06-04 by Kevin Feasel

Jonathan Lacefield gives us a heads-up:

Historically, Kafka has relied on what we now call the “classic” rebalance protocol. This protocol evolved, as it was initially dominated by the “eager” assignment strategy. Eager rebalancing worked on a stop-the-world principle: Any change in group membership (consumer joining/leaving) or topic metadata triggered a complete halt. All consumers revoked their partitions, a leader computed a new assignment, and partitions were redistributed before processing could resume. This caused significant downtime, especially in dynamic environments.

To mitigate this, the cooperative assignment strategy was introduced within the classic protocol. Cooperative rebalancing reduced downtime by allowing consumers to keep partitions unaffected by the rebalance, revoking only those needing reassignment.

Read on to learn about some of the challenges that exist with rebalancing, and what KIP-848 promises to do.

Comments closed

Kafka Connector for Cosmos DB

Published 2025-05-22 by Kevin Feasel

Sudhindra Sheshadrivasan announces a new connector has become generally available:

We’re excited to announce the General Availability (GA) of the Confluent fully managed V2 connector for Apache Kafka® for Azure Cosmos DB! This release marks a major milestone in our mission to simplify real-time data streaming from and to Azure Cosmos DB using Apache Kafka®.

The V2 connector is now production-ready and available directly from the Confluent Cloud connector catalog. This managed connector allows you to seamlessly integrate Azure Cosmos DB with your Kafka-powered event streaming architecture—without worrying about provisioning, scaling, or managing the connector infrastructure.

Read on to learn more about the new connector and what it takes to hook everything up.

Comments closed

Building a Multi-Agent Orchestrator with Flink and Kafka

Published 2025-05-05 by Kevin Feasel

Sean Falconer builds an orchestration engine:

Just as some problems are too big for one person to solve, some tasks are too complex for a single artificial intelligence (AI) agent to handle. Instead, the best approach is to decompose problems into smaller, specialized units so that multiple agents can work together as a team.

This is the foundation of a multi-agent system—networks of agents, each with a specific role, collaborating to solve larger problems.

Read on for the overview. There’s also a code repository and a free e-book on the topic.

Comments closed

Kafka Data Exploration with Tableflow

Published 2025-04-29 by Kevin Feasel

Robin Moffatt does some exploratory data analysis:

One of the challenges that I’d always had when it came to building streaming data pipelines is that once data is in a Kafka topic, it becomes trickier to query. Whether limited by the available tools to do this or the speed of access, querying Kafka is just not a smooth experience.

This blog post will show you a really nice way of exploring and validating data in Apache Kafka®. We’ll use Tableflow to expose the Kafka topics as Apache Iceberg™️ tables and then query them using standard SQL tools.

Click through for the demonstration using a real dataset.

Comments closed

Diskless Topics in Apache Kafka

Published 2025-04-25 by Kevin Feasel

Filip Yonov and Josep Prat work through a challenge:

KIP-1150 isn’t a distant, strange planet; it just reroutes Kafka’s entire replication pathway from broker disks to cloud object storage. Flip one topic flag and your data bypasses local drives altogether:

No disks to babysit: Hot-partition drama, IOPS ceilings, and multi-hour rebalances vanish—freeing up time (for more blog posts).

Cloud bill trimmed by up to 80%: Object storage replaces triple-replicated setups with pricey SSDs and every byte of cross‑zone replication, erasing the “cloud tax”.

Scale in real time: With nothing pinned to brokers, you can spin brokers up (or down) in seconds to absorb traffic spikes.

Because Diskless is built into Kafka (no client changes, no forks), we had to solve a 4D puzzle: How do you make a Diskless topic behave exactly like a Kafka one—inside the same cluster—without rewriting Kafka? This blog unpacks the first‑principles, deep dive into the thought process,and trade‑offs that shaped the proposal.

Click through for a deep dive on this from the perspective of a platform host.

Comments closed

Kafka Consumer Offset Changes with KIP-1094

Published 2025-04-23 by Kevin Feasel

Alieh Saeedi looks at a change in Apache Kafka 4.0.0:

Consumer offsets are at the heart of Apache Kafka®’s robust data handling capabilities, as they determine how data is consumed, reprocessed, or skipped across topics and partitions. In this comprehensive guide, we delve into the intricacies of Kafka offsets, covering everything from the necessity of manual offset control to the nuanced challenges posed by offset management in distributed environments. We further explore the solutions and enhancements introduced by KIP-1094 (available in Kafka 4.0.0), offering a closer look at how it addresses these challenges by enabling more accurate and reliable offset and leader epoch information retrieval.

Click through for an overview of how consumer behavior works, as well as what KIP-1094 does.

Comments closed

Troubleshooting an Apache Flink Job Not Producing Results

Published 2025-04-18 by Kevin Feasel

Wade Waldron digs in:

Imagine that you have built an Apache Flink® job. It collects records from Apache Kafka®, performs a time-based aggregation on those records, and emits a new record to a different topic. With your excitement high, you run the job for the first time, and are disappointed to discover that nothing happens. You check the input topic and see the data flowing, but when you look at the output topic, it’s empty.

In many cases, this is an indication that there is a problem with watermarks. But what is a watermark?

Read on for a primer on watermarks, followed by an explanation of the common solution to the problem Wade describes.

Comments closed

Kafka Deployment in a KRaft World

Published 2025-03-26 by Kevin Feasel

Sven Loesekann deploys Apache Kafka 4.0:

With KRaft for Kafka ZooKeeper is no longer needed. KRaft is a protocol to select a leader among several server instances. That makes the Kafka setup much easier.

The new configuration is shown on the example of the MovieManager project.

Click through to see how you can install and configure Kafka without taking a dependency on ZooKeeper.

Comments closed

Apache Kafka 4.0 Now Available

Published 2025-03-25 by Kevin Feasel

David Jacot announces a milestone release for Apache Kafka:

Apache Kafka 4.0 is a significant milestone, marking the first major release to operate entirely without Apache ZooKeeper™️. By running in KRaft mode by default, Kafka simplifies deployment and management, eliminating the complexity of maintaining a separate ZooKeeper ensemble. This change significantly reduces operational overhead, enhances scalability, and streamlines administrative tasks. We want to take this as an opportunity to express our gratitude to the ZooKeeper community and say thank you! ZooKeeper was the backbone of Kafka for more than 10 years, and it did serve Kafka very well. Kafka would most likely not be what it is today without it. We don’t take this for granted, and highly appreciate all of the hard work the community invested to build ZooKeeper. Thank you!

There are some other big items in Kafka 4.0 and you can see more in the post’s changelog.

Comments closed

Category: Kafka / Flink