Press "Enter" to skip to content

Category: Streaming

Tuning Kafka Connect Source Connectors

Catalin Pop makes things faster:

Kafka Connect is an open source data integration tool that simplifies the process of streaming data between Apache Kafka® and other systems. Kafka Connect has two types of connectors: source connectors and sink connectors. Source connectors allow you to read data from various sources and write it to Kafka topics. Sink connectors send data from the topics to another endpoint. This blog post discusses how to tune your source connectors to help you get the best throughput out of your compute resources. 

This includes which elements are tunable, metrics you’ll want to pay attention to along the way, and a detailed example.

Comments closed

Flink Streaming Use Cases for Kafka Users

Jean-Sebastien Brunner gives us some use cases:

In Part One of our “Inside Flink” blog series, we explored the critical role of stream processing and why developers are increasingly choosing Apache Flink® over other frameworks. 

In this second installment, we’ll showcase how innovative teams across every industry and size are putting stream processing into practice – from streaming data pipelines to train ML models or more timely analytics to fraud detection in finance and real-time inventory management in retail. We’ll also discuss how Flink is uniquely suited to support a wide spectrum of use cases and helps teams uncover immediate insights in their data streams and react to events in real time.

This article stays more at the “art of the possible” level rather than drilling into how we can do it.

Comments closed

Versioned State Store in Kafka Streams

Victoria Xia announces new functionality in Apache Kafka 3.5:

Since the introduction of stream processing, there have been three certainties in life: death, taxes, and out-of-order data. As a stream processing library built for Apache Kafka, Kafka Streams processes data in offset order. When out-of-order data is present, offset order differs from timestamp order and care must be taken to ensure that processing results respect timestamp order where appropriate. The introduction of versioned state stores to Kafka Streams in the Apache Kafka 3.5 release is a huge milestone in this direction.

In this blog post, I’ll address the what, why, and how of versioned stores in Kafka Streams, including what they are, why you might like to use them, how to get started, and a couple of things to watch out for when upgrading.

Read on to see what this entails and how you can try it out yourself.

Comments closed

Stream Processing with Flink and Kafka

Konstantin Knauf starts a new series:

There was a huge amount of buzz about Apache Flink® at this year’s Kafka Summit London. From an action-packed keynote to standing-room only breakout sessions, it’s clear that the Apache Kafka® community is hungry to learn more about Flink and how the stream processing framework fits into the modern data streaming stack.

That’s why we’re excited to introduce our new “Inside Flink” blog series that takes a deeper look at why developers and organizations everywhere are shifting their stream processing technologies to Flink. Our first blog post explains what Flink is and how it can enhance your streaming use cases running on Kafka. Future topics will include common Flink use cases, an inside look at Flink SQL, and much more.

Click through for the first post in the series, which covers what Flink is and how the two products can interoperate.

Comments closed

Contrasting Spark and Flink for Streaming Use Cases

Deepthi Mohan and Karthi Thyagarajan contrast two products:

Apache Flink and Apache Spark are both open-source, distributed data processing frameworks used widely for big data processing and analytics. Spark is known for its ease of use, high-level APIs, and the ability to process large amounts of data. Flink shines in its ability to handle processing of data streams in real-time and low-latency stateful computations. Both support a variety of programming languages, scalable solutions for handling large amounts of data, and a wide range of connectors. Historically, Spark started out as a batch-first framework and Flink began as a streaming-first framework.

In this post, we share a comparative study of streaming patterns that are commonly used to build stream processing applications, how they can be solved using Spark (primarily Spark Structured Streaming) and Flink, and the minor variations in their approach. Examples cover code snippets in Python and SQL for both frameworks across three major themes: data preparation, data processing, and data enrichment. If you are a Spark user looking to solve your stream processing use cases using Flink, this post is for you. We do not intend to cover the choice of technology between Spark and Flink because it’s important to evaluate both frameworks for your specific workload and how the choice fits in your architecture; rather, this post highlights key differences for use cases that both these technologies are commonly considered for.

Read on for an analysis of the two products.

Comments closed

Azure Stream Analytics No-Code Editor

Xu Jiang shows off a new designer:

Azure Stream Analytics is a fully managed stream processing engine designed to analyze and process large volumes of streaming data with sub-millisecond latencies. Using a SQL-like query language, it empowers you to analyze your streaming data efficiently. It only takes a few clicks to connect to multiple sources and sinks, creating a Stream Analytics job. 

The no-code editor offers an intuitive user experience that enables you to develop Stream Analytics jobs effortlessly, using drag-and-drop functionality, without having to write any code. It further simplifies Stream Analytics job development experience. With just a few clicks, you can quickly develop jobs to handle diverse scenarios in just minutes. It is available in the Azure Event Hubs portal, and now in Azure Stream Analytics portal as well.

Read on to see what it looks like and what you can do with it.

Comments closed

Contrasting Kafka and Pulsar

Tessa Burk perform a comparson:

Apache Kafka® and Apache Pulsar™ are 2 popular message broker software options. Although they share certain similarities, there are big differences between them that impact their suitability for various projects.  

In this comparison guide, we will explore the functionality of Kafka and Pulsar, explain the differences between the software, who would use them, and why.  

Click through for that comparison. I haven’t used Pulsar before, so it’s interesting to get this sort of a functionality and community comparison.

Comments closed

Building an Azure Stream Analytics Query

Alex Lin takes us through the process:

As a developer, your journey with Azure Stream Analytics (ASA) can be divided into several stages, each with its own set of challenges and requirements. In this blog post, we’ll walk you through the typical developer journey in ASA, from the initial setup to production deployment. Along the way, we’ll explore the various development tools and best practices that will help you build a Stream Analytics job. 

Click through for the demonstration.

Comments closed

Testing Message Ordering in Kafka

Francesco Tisiot puts a claim to the test:

One of Apache Kafka®’s most known mantras is “it preserves the message ordering per topic-partition”, but is it always true? In this blog post we’ll analyze a few real scenarios where accepting the dogma without questioning it could result in unexpected, and erroneous, sequences of messages.

There’s a lot more to this than I realized, and Francesco does a great job of explaining it.

Comments closed

An Overview of Kafka Streams

The Instaclustr team explains how stream processing works in Kafka Streams:

Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.

Read on for an overview of how it works. And if you haven’t already, check out the prior post on Kafka so that you can experience the same slight mental perturbations I did when reading about “real-time” responses.

Comments closed