Press "Enter" to skip to content

Category: Kafka / Flink

Securing a Kafka Ecosystem

Riya has a breakdown of how to protect your Apache Kafka installation and resources around it:

Apache Kafka is the backbone of many real-time data pipelines, making security an essential aspect of its deployment. Protecting your Kafka ecosystem involves implementing encryption to safeguard data, authentication to verify user identities, and authorization to control access. This guide provides a comprehensive overview of these three pillars of securing Kafka, complete with code examples to help you implement best practices.

Click through for demonstrations of encryption, authentication, and authorization.

Leave a Comment

Handling a Consumer Fetch Request in Kafka

Multiple Confluent employees (who apparently don’t get to have names this time around) wrap up a series:

It’s been a long time coming, but we’ve finally arrived at the fourth and final installment of our blog series. In this series, we’ve been peeling back the layers of Apache Kafka® to get a deeper understanding of how best to interact with the cluster using producer and consumer clients.

Read on for the final part, as well as links to previous parts if you missed them.

Comments closed

Apache Kafka 3.9 Now Available

Colin McCabe announces Apache Kafka 3.9:

We are proud to announce the release of Apache Kafka 3.9.0. This is a major release, the final one in the 3.x line. This will also be the final major release to feature the deprecated Apache ZooKeeper® mode. Starting in 4.0 and later, Kafka will always run without ZooKeeper.

That’s a pretty big change, but there are also quite a few other significant changes here to check out.

Comments closed

Minimizing Latency in Kafka Streaming Applications using APIs

Abhishek Goswami doesn’t want to slow down the stream:

Kafka is widely adopted for building real-time streaming applications due to its fault tolerance, scalability, and ability to process large volumes of data. However, in general, Kafka streaming consumers work best only in an environment where they do not have to call external APIs or databases. In a situation when a Kafka consumer must make a synchronous database or API call, the latency introduced by network hops or I/O operations adds up and accumulates easily (especially when the streaming pipeline is performing an initial load of a large volume of data before starting CDC). This can significantly slow down the streaming pipeline and result in the blowing of system resources impacting the throughput of the pipeline. In extreme situations, this may even become unsustainable as Kafka consumers may not be able to commit offsets due to increased latency before the next polling call and get continuously rebalanced by the broker, practically not processing anything yet incrementally consuming more system resources as time passes.

This is a real problem faced by many streaming applications. In this article, we’ll explore some effective strategies to minimize latency in Kafka streaming applications where external API or database calls are inevitable. We’ll also compare these strategies with the alternative approach of separating out the parts of the pipeline that require these external interactions into a separate publish/subscribe-based consumer.

Read on to understand the causes of this latency and several patterns you can use to limit it.

Comments closed

Preparing a Fetch Operation in a Kafka Consumer

Danica Fine continues a series on Kafka internals:

Welcome back to the third installment of our blog series where we’re diving into the beautiful black box that is Apache Kafka® to better understand how we interact with the cluster through producer and consumer clients.

Earlier in the series, we took a look at the Kafka producer to see how the client works before following a produce request as it’s processed by the cluster.

In this post, we’ll switch our attention to Kafka Consumer clients to see how consumers interact with the brokers, coordinate their partitions, and send requests to read data from your Kafka topics.

Read on to see what it takes for a consumer to operate in Apache Kafka.

Comments closed

Working with the Apache Flink Table API

Martijn Visser takes us through the Flink Table API:

Apache Flink® offers a variety of APIs that provide users with significant flexibility in processing data streams. Among these, the Table API stands out as one of the most popular options. Its user-friendly design allows developers to express complex data processing logic in a clear and declarative manner, making it particularly appealing for those who want to efficiently manipulate data without getting bogged down in intricate implementation details.

At this year’s Current, we introduced support for the Flink Table API in Confluent Cloud for Apache Flink® to enable customers to use Java and Python for their stream processing workloads. The Flink Table API is also supported in Confluent Platform for Apache Flink®, which launched in limited availability and supports all Flink APIs out of the box.

This introduction highlights its capabilities, how it integrates with other Flink APIs, and provides practical examples to help you get started. Whether you are working with real-time data streams or static datasets, the Table API simplifies your workflow while maintaining high performance and flexibility. If you want to go deeper into the details of how Table API works, we encourage you to check out our Table API developer course.

Read on to learn more information about how the Table API works in comparison to other interfaces.

Comments closed

Tracking Airport Traffic with Flink, Kafka, and NiFi

Tim Spann builds an app:

The above link utilizes the standard REST link and enhances it by setting the beginning date using NiFi’s Expression language to get the current time in UNIX format in seconds. In this example, I am looking at the last week of data for the airport departures and arrivals in the second URL.

We iterate through a list of the largest airports in the United States doing both departures and arrivals since they use the same format.

Read the article to learn more about how you can tie it all together. You can also check out Tim’s GitHub repo to grab the code.

Comments closed

Kafka Internals: Handling a Producer Request

Danica Fine continues a series on Kafka internals:

Welcome to the second installment of our blog series to understand the inner workings of the beautiful black box that is Apache Kafka®. 

We’re diving headfirst into Kafka to see how we actually interact with the cluster through producers and consumers. Along the way, we explore the configurations that affect each step of this epic journey and the metrics that we can use to more effectively monitor the process. 

In the last blog, we explored what the Kafka producer client does behind the scenes each time we call producer.send() (or similar, depending on your language of choice). In this post, we follow our brave hero, a well-formed produce request, that’s on its way to the broker to be processed and have its data stored on the cluster.

Click through to learn more about how it all works.

Comments closed

Testing Kafka Messages with RecordCaptor

Anton Belyaev shows off an open-source utility:

Let’s take a Telegram bot that forwards requests to the OpenAI API and returns the result to the user as an example. If the request to OpenAI violates the system’s security rules, the client will be notified. Additionally, a message will be sent to Kafka for the behavioral control system so that the manager can contact the user, explain that their request was too sensitive even for our bot, and ask them to review their preferences.

The interaction contracts with services are described in a simplified manner to emphasize the core logic. Below is a sequence diagram demonstrating the application’s architecture. I understand that the design may raise questions from a system architecture perspective, but please approach it with understanding — the main goal here is to demonstrate the approach to writing tests.

Read on to see how it all works, as well as links to Anton’s GitHub repo for testing in Kafka.

Comments closed

Comparing Azure Event Hubs to Apache Kafka

Dharmbir Kashyap makes a comparison:

In the realm of event streaming and real-time data processing, choosing the right platform is critical to the success of your project. Two of the most popular options available today are Azure Event Hub and Apache Kafka. Both platforms offer robust solutions for handling large volumes of streaming data, but they are designed with different architectures, features, and use cases in mind. This blog post will delve into the key differences between Azure Event Hub and Kafka, helping you determine which platform is best suited for your specific needs.

Read on for an overview of each product and where each product fits.

Comments closed