Kafka / Flink – Curated SQL

The Downside of Zero-Copy Integration between Kafka and Iceberg

Published 2025-10-16 by Kevin Feasel

Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics could live directly as Iceberg tables. On the surface it sounds efficient: one copy of the data, unified access for both streaming and analytics. But from a systems point of view, I think this is the wrong direction for the Apache Kafka project. In this post, I’ll explain why.

Read on for an explanation of what “zero-copy” means here, as well as Jack’s position on the matter. I think it’s a solid argument and worth the read.

Comments closed

Cross-Cloud Data Replication with Confluent

Published 2025-10-14 by Kevin Feasel

Ahmed Saef Zamzam and Hannah Miao move some data:

Cross-cloud replication over private networks is powered by Cluster Linking, Confluent’s fully managed, offset-preserving replication service that mirrors topics across clusters. Cluster Linking already makes it simple to connect environments across regions, clouds, and hybrid deployments with near-zero data loss. Now, with private cross-cloud replication, the possibilities expand even further—enabling secure multicloud data sharing, disaster recovery, and compliance use cases that many organizations, particularly those in regulated industries, have struggled to solve for years.

Click through to see how it works and how it can beat mechanisms that existed prior to it.

Comments closed

Scaling Kafka Streams Applications

Published 2025-10-01 by Kevin Feasel

The Confluent employee mines have a new article:

As the adoption of real-time data processing accelerates, the ability to scale stream processing applications to handle high-volume traffic is paramount. Apache Kafka®, the de facto standard for distributed event streaming, provides a powerful and scalable library in Kafka Streams for building such applications.

Scaling a Kafka Streams application effectively involves a multi-faceted approach that encompasses architectural design, configuration tuning, and diligent monitoring. This guide will walk you through the essential strategies and best practices to ensure your Kafka Streams applications can gracefully handle massive throughput.

The post gets into some details around the kinds of limits you’ll hit during scaling, scale-up versus scale-out, and configuration settings to help with that scale.

Comments closed

What’s New in Apache Kafka 4.1.0

Published 2025-09-05 by Kevin Feasel

Mickael Maison lays out some changes:

The Apache Kafka community is proud to announce the release of Apache Kafka® 4.1.0. This blog post highlights the many new features and improvements included in this release. For a full list of changes, be sure to check the release notes.

Queues for Kafka (KIP-932) is now in preview. It’s still not ready for production, but you can start evaluating and testing it. See the preview release notes for more details.

This release also introduces a new Streams Rebalance Protocol (KIP-1071) in early access. It is based on the new consumer group protocol (KIP-848).

Read on for another 15 or so completed items.

Comments closed

Lessons Learned on Migrating to Apache Kafka

Published 2025-08-22 by Kevin Feasel

Ravi Teja Thutari shares some advice:

The legacy e-commerce platform was a PHP-based monolith handling catalog, orders, inventory, and customer data. With business growth, the monolith could not scale further. Maintaining feature velocity was hard because every change risked the entire system. We needed scalability, resilience, and faster releases. Shifting to event-driven microservices promised to address these issues. In practice we adopted Kafka on Kubernetes, similar to other online retailers .

Our priorities were (1) decoupling services so each team could deploy independently, (2) modeling business events consistently across domains, and (3) ensuring reliable delivery at scale (with retries and DLQs for failures). As a starting point, we documented key domain events (e.g. OrderCreated, PaymentProcessed, InventoryAllocated) and sketched a target architecture. Like other high-traffic systems, we planned horizontal scaling: adding Kafka brokers and topic partitions to match consumer parallelism. We also planned for observability from Day 1 (metrics, logs, traces) to monitor performance and troubleshoot issues.

Read on for more information about how that migration went.

Comments closed

Kafka: From ZooKeeper to KRaft

Published 2025-08-14 by Kevin Feasel

Phil Yang lays out how to make a migration:

Apache Kafka has made a landmark shift in KIP-500 with the introduction of Kafka Raft (KRaft) mode, eliminating the dependency on Apache ZooKeeper for metadata management. With KRaft, the Kafka nodes themselves can be configured as KRaft controllers – which allow for metadata management and leader elections to work all within just Kafka, resulting in significant performance improvements. This cemented KRaft’s status as the metadata management protocol for Kafka moving forward.

This blog will guide you through the importance of this transition, what migrating from ZooKeeper to KRaft entails, and how we, at NetApp Instaclustr, make this seamless with our automated, streamlined process that is built into our platform.

Click through to see how you can update your own clusters, whether you’re using the Instaclustr service or not.

Comments closed

Retry Resiliency in Apache Kafka Pipelines

Published 2025-07-24 by Kevin Feasel

Ravi Teja Thutari explains the value of idempotence in moving data between systems:

In modern flight booking systems, streaming fare updates and reservations through distributed microservices is common. These pipelines must be retry-resilient, ensuring that transient failures or replays don’t cause duplicate bookings or stale pricing. A core strategy is idempotency: each event (e.g., a fare-update or booking command) carries a unique identifier so processing it more than once has no adverse effect.

Read on to learn more. For reference, idempotence is a property of an operation where you can run through the operation as many times as you wish and will always end up at the same result. In the data operations world, this ties to the final state in a database. If I run a process once and it adds three rows to the database, I should be able to run the process a second time and end up with those exact three rows, no more, no fewer, and no different.

Comments closed

Confluent Schema Registry Support in Fabric Eventstreams

Published 2025-07-18 by Kevin Feasel

Xu Jiang announces a new preview feature:

We are pleased to announce that Eventstream’s Confluent Cloud for Apache Kafka streaming connector now supports decoding data from Confluent Cloud for Apache Kafka topics that are associated with a data contract in Confluent Schema Registry.

Click through to see how this currently works.

Comments closed

Diskless Kafka in Practice

Published 2025-06-19 by Kevin Feasel

Hugh Evans lays it out:

I joined Aiven as a Developer Advocate in May, shortly after the Kafka Improvement Proposal KIP-1150: Diskless Topics was announced, which is reduces the total cost of ownership of Kafka by up to 80%! It was very exciting to join Aiven just as the streaming team were making this major contribution to open source but I wanted to take my time to understand the KIP before sharing my thoughts.

In this article I’ll share my first impressions of Diskless Kafka, walk you through a simple example you can use to experiment with Diskless, and highlight some of the great resources that are out there for learning about the topic. First though, what actually is Diskless Kafka?

Click through for that answer, as well as more.

Comments closed

Consumer Group Rebalancing in Kafka and KIP-848

Published 2025-06-04 by Kevin Feasel

Jonathan Lacefield gives us a heads-up:

Historically, Kafka has relied on what we now call the “classic” rebalance protocol. This protocol evolved, as it was initially dominated by the “eager” assignment strategy. Eager rebalancing worked on a stop-the-world principle: Any change in group membership (consumer joining/leaving) or topic metadata triggered a complete halt. All consumers revoked their partitions, a leader computed a new assignment, and partitions were redistributed before processing could resume. This caused significant downtime, especially in dynamic environments.

To mitigate this, the cooperative assignment strategy was introduced within the classic protocol. Cooperative rebalancing reduced downtime by allowing consumers to keep partitions unaffected by the rebalance, revoking only those needing reassignment.

Read on to learn about some of the challenges that exist with rebalancing, and what KIP-848 promises to do.

Comments closed

Category: Kafka / Flink