Category: Kafka / Flink

Preparing a Fetch Operation in a Kafka Consumer

Published 2024-10-22 by Kevin Feasel

Danica Fine continues a series on Kafka internals:

Welcome back to the third installment of our blog series where we’re diving into the beautiful black box that is Apache Kafka® to better understand how we interact with the cluster through producer and consumer clients.

Earlier in the series, we took a look at the Kafka producer to see how the client works before following a produce request as it’s processed by the cluster.

In this post, we’ll switch our attention to Kafka Consumer clients to see how consumers interact with the brokers, coordinate their partitions, and send requests to read data from your Kafka topics.

Read on to see what it takes for a consumer to operate in Apache Kafka.

Comments closed

Working with the Apache Flink Table API

Published 2024-10-03 by Kevin Feasel

Martijn Visser takes us through the Flink Table API:

Apache Flink® offers a variety of APIs that provide users with significant flexibility in processing data streams. Among these, the Table API stands out as one of the most popular options. Its user-friendly design allows developers to express complex data processing logic in a clear and declarative manner, making it particularly appealing for those who want to efficiently manipulate data without getting bogged down in intricate implementation details.

At this year’s Current, we introduced support for the Flink Table API in Confluent Cloud for Apache Flink® to enable customers to use Java and Python for their stream processing workloads. The Flink Table API is also supported in Confluent Platform for Apache Flink®, which launched in limited availability and supports all Flink APIs out of the box.

This introduction highlights its capabilities, how it integrates with other Flink APIs, and provides practical examples to help you get started. Whether you are working with real-time data streams or static datasets, the Table API simplifies your workflow while maintaining high performance and flexibility. If you want to go deeper into the details of how Table API works, we encourage you to check out our Table API developer course.

Read on to learn more information about how the Table API works in comparison to other interfaces.

Comments closed

Tracking Airport Traffic with Flink, Kafka, and NiFi

Published 2024-09-30 by Kevin Feasel

Tim Spann builds an app:

The above link utilizes the standard REST link and enhances it by setting the beginning date using NiFi’s Expression language to get the current time in UNIX format in seconds. In this example, I am looking at the last week of data for the airport departures and arrivals in the second URL.

We iterate through a list of the largest airports in the United States doing both departures and arrivals since they use the same format.

Read the article to learn more about how you can tie it all together. You can also check out Tim’s GitHub repo to grab the code.

Comments closed

Kafka Internals: Handling a Producer Request

Published 2024-09-26 by Kevin Feasel

Danica Fine continues a series on Kafka internals:

Welcome to the second installment of our blog series to understand the inner workings of the beautiful black box that is Apache Kafka®.

We’re diving headfirst into Kafka to see how we actually interact with the cluster through producers and consumers. Along the way, we explore the configurations that affect each step of this epic journey and the metrics that we can use to more effectively monitor the process.

In the last blog, we explored what the Kafka producer client does behind the scenes each time we call producer.send() (or similar, depending on your language of choice). In this post, we follow our brave hero, a well-formed produce request, that’s on its way to the broker to be processed and have its data stored on the cluster.

Click through to learn more about how it all works.

Comments closed

Testing Kafka Messages with RecordCaptor

Published 2024-09-19 by Kevin Feasel

Anton Belyaev shows off an open-source utility:

Let’s take a Telegram bot that forwards requests to the OpenAI API and returns the result to the user as an example. If the request to OpenAI violates the system’s security rules, the client will be notified. Additionally, a message will be sent to Kafka for the behavioral control system so that the manager can contact the user, explain that their request was too sensitive even for our bot, and ask them to review their preferences.

The interaction contracts with services are described in a simplified manner to emphasize the core logic. Below is a sequence diagram demonstrating the application’s architecture. I understand that the design may raise questions from a system architecture perspective, but please approach it with understanding — the main goal here is to demonstrate the approach to writing tests.

Read on to see how it all works, as well as links to Anton’s GitHub repo for testing in Kafka.

Comments closed

Comparing Azure Event Hubs to Apache Kafka

Published 2024-09-03 by Kevin Feasel

Dharmbir Kashyap makes a comparison:

In the realm of event streaming and real-time data processing, choosing the right platform is critical to the success of your project. Two of the most popular options available today are Azure Event Hub and Apache Kafka. Both platforms offer robust solutions for handling large volumes of streaming data, but they are designed with different architectures, features, and use cases in mind. This blog post will delve into the key differences between Azure Event Hub and Kafka, helping you determine which platform is best suited for your specific needs.

Read on for an overview of each product and where each product fits.

Comments closed

Updates in Apache Kafka 3.8

Published 2024-07-30 by Kevin Feasel

Josep Prat announces a slew of changes:

We are proud to announce the release of Apache Kafka 3.8.0. This release contains many new features and improvements. This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check the release notes.

See the Upgrading to 3.8.0 from any version 0.8.x through 3.7.x section in the documentation for the list of notable changes and detailed upgrade steps.

This also puts Kafka one step closer to getting rid of its ZooKeeper dependency altogether.

Comments closed

Test Isolation with Kafka

Published 2024-07-12 by Kevin Feasel

Anton Belyaev builds some tests:

The experience of running Kafka in test scenarios has reached a high level of convenience thanks to the use of Test containers and enhanced support in Spring Boot 3.1 with the @ServiceConnection annotation. However, writing and maintaining integration tests with Kafka remains a challenge. This article describes an approach that significantly simplifies the testing process by ensuring test isolation and providing a set of tools to achieve this goal. With the successful implementation of isolation, Kafka tests can be organized in such a way that at the stage of result verification, there is full access to all messages that have arisen during the test, thereby avoiding the need for forced waiting methods such as Thread.sleep().

This method is suitable for use with Test containers, Embedded Kafka, or other methods of running the Kafka service (e.g., a local instance).

Click through for that approach.

Comments closed

Building a Full-Stack App with Kafka and Node.js

Published 2024-07-12 by Kevin Feasel

Lucia Cerchie builds an application:

A well-known debate: tabs or spaces? Sure, we could set up a Google Form to collect this data, but where’s the fun in that? Let’s settle the debate, Kafka-style. We’ll use the new confluent-kafka-javascript client (not in general availability yet) to build an app that produces the current state of the vote counts to a Kafka topic and consumes from that same topic to surface them to a JavaScript frontend.

Why are we using this client in particular? It comes from Confluent and is intended for use with Apache Kafka® and Confluent Platform. It’s compatible with Confluent’s cloud offering as well. It builds on concepts from the two most popular Kafka JavaScript client libraries: KafkaJS and node-rdkafka. The functionality is based on node-rdkafka, however, it also provides a way to interface with the library via methods similar to those in KafkaJS due to their developer-friendy nature. There are two APIs: the first implements the functionality based on node-rdkafka; the second is a promisified API with the methods akin to those in KafkaJS. By choosing this client, we can access wide functionality and have a smooth developer experience via the dev-friendly methods.

Click through for the code and explanation. Meanwhile, tabs in my heart, spaces in my job.

Comments closed

Hot and Cold Partitions for Apache Kafka Data

Published 2024-07-01 by Kevin Feasel

Gautan Goswami splits the data:

At first, data tiering was a tactic used by storage systems to reduce data storage costs. This involved grouping data that was not accessed as often into more affordable, if less effective, storage array choices. Data that has been idle for a year or more, for example, may be moved from an expensive Flash tier to a more affordable SATA disk tier. Even though they are quite costly, SSDs and flash can be categorized as high-performance storage classes. Smaller datasets that are actively used and require the maximum performance are usually stored in Flash.

Cloud data tiering has gained popularity as customers seek alternative options for tiering or archiving data to a public cloud. Public clouds presently offer a mix of object and file storage options. Object storage classes such as Amazon S3 and Azure Blob (Azure Storage) deliver significant cost efficiency and all the benefits of object storage without the complexities of setup and management.

Read on for an architecture that uses hot and cold tiers, as well as how you can set it up on an existing Kafka topic.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31