Streaming – Page 5 – Curated SQL

An Overview of Kafka Streams

Published 2023-03-21 by Kevin Feasel

The Instaclustr team explains how stream processing works in Kafka Streams:

Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.

Read on for an overview of how it works. And if you haven’t already, check out the prior post on Kafka so that you can experience the same slight mental perturbations I did when reading about “real-time” responses.

Comments closed

Working with Kafka from Python

Published 2023-03-03 by Kevin Feasel

Dave Shook has a new course for us:

If you’re a Python developer, our free Apache Kafka for Python Developers course will show you how to harness the power of Kafka in your applications. You will learn how to build Kafka producer and consumer applications, how to work with event schemas and take advantage of Confluent Schema Registry, and more. Follow along in each module as Dave Klein, Senior Developer Advocate at Confluent, covers all of these topics in detail. Hands-on exercises occur throughout the course to solidify concepts as they are presented. At its end, you will have the knowledge you need to begin developing Python applications that stream data to and from Kafka clusters.

Read on to learn more about it and give it a try.

Comments closed

Tips for Kafka Streams Developers

Published 2023-02-23 by Kevin Feasel

Ludovic Dehon shares some advice:

We built Kestra, an open-source data orchestration and scheduling platform, and we decided to use Kafka as the central datastore to build a scalable architecture. We rely heavily on Kafka Streams for most of our services (the executor and the scheduler) and have made some assumptions on how it handles the workload.

However, Kafka has some restrictions since it is not a database, so we need to deal with the constraints and adapt the code to make it work with Kafka. We will cover topics, such as using the same Kafka topic for source and destination, and creating a custom joiner for Kafka Streams, to ensure high throughput and low latency while adapting to the constraints of Kafka and making it work with Kestra.

Click through for several tips.

Comments closed

Kafka Control and Data Planes

Published 2023-02-22 by Kevin Feasel

Sanjay Garde explains how the architecture of Apache Kafka solutions has expanded over time:

With the advent of service mesh and containerized applications, the idea of the control and data plane has become popular. A part of your application infrastructure, such as a proxy or sidecar, is dedicated to aspects such controlling traffic, access, governance, security, and monitoring and is referred to as the control plane. Another part of your application infrastructure that is used purely for processing your business transactions is referred to as the data plane.

Read on to see how the concept works at an architectural level.

Comments closed

Delta Lake Support in Azure Stream Analytics

Published 2023-02-20 by Kevin Feasel

Emma An makes an announcement:

Delta Lake has gained popularity in recent times due to its unique features and advantages over traditional data warehouse and other storage formats. For those already using traditional data storage format or moving to a lakehouse architecture, Delta Lake can offer several compelling benefits that can further enhance the performance and capabilities of their data pipelines. Many Azure services are integrated with Delta Lake, and now you can use Azure Stream Analytics to write in Delta format.

In this blog, we will explain the native support of Delta Lake in Azure Stream Analytics, that can help users take their workload to the next level, providing a seamless and scalable solution for large-scale data processing and storage. It is easy to start, taking only a few clicks to create an end-to-end pipeline, and write to either a new or existing Delta table stored in Azure Data Lake Storage Gen2.

This is a nice addition to Stream Analytics and Emma shows two ways you can write out results in Delta Lake format.

Comments closed

Working with Kafka using .NET 6

Published 2023-02-06 by Kevin Feasel

Jaydeep Patil adds a NuGet package:

We will go over Apache Kafka basics, installation, and operation, as well as a step-by-step implementation using a .NET Core 6 web application.

Click through for a demo-heavy walkthrough. And if you’re interested in ksqldb or Kafka Streams, also check out Tomas Fabian’s library. Both are under active development and each forms a piece of the whole Kafka solution.

Comments closed

Flink 1.16.1 Release

Published 2023-02-03 by Kevin Feasel

Martijn Visser announces Apache Flink version 1.16.1:

The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.16 series.

This release includes 84 bug fixes, vulnerability fixes, and minor improvements for Flink 1.16. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.

We highly recommend all users upgrade to Flink 1.16.1.

Read on for the release notes, including links to all of the closed tickets.

Comments closed

Flink Table Store 0.3

Published 2023-01-18 by Kevin Feasel

Jingsong Lee announces a new version of Flink Table Store:

Sometimes users only care about aggregated results. The aggregation merge engine aggregates each value field with the latest data one by one under the same primary key according to the aggregate function.

Each field that is not part of the primary keys must be given an aggregate function, specified by the fields.<field-name>.aggregate-function table property.

Read on for the full changeset.

Comments closed

Capturing Event Hubs Data in Delta Lake Format with Stream Analytics

Published 2022-12-21 by Kevin Feasel

Xu Jiang announces a public preview:

The Stream Analytics no-code editor is a drag and drop design tool that helps customers to develop the Stream Analytics jobs without writing a single line of code. The experience provides a canvas that allows you to connect to input sources to quickly see your streaming data. Then you can transform and preview it before writing to your destination of choice in Azure. To learn more, see No-code stream processing through Azure Stream Analytics | Microsoft Learn.

Read on to see how you can capture and process data into Delta Lake format via their designer.

Comments closed

Optimizing Async Sinks in Flink

Published 2022-11-29 by Kevin Feasel

Hong Liang Teoh speeds things up:

When designing a Flink data processing job, one of the key concerns is maximising job throughput. Sink throughput is a crucial factor because it can determine the entire job’s throughput. We generally want the highest possible write rate in the sink without overloading the destination. However, since the factors impacting a destination’s performance are variable over the job’s lifetime, the sink needs to adjust its write rate dynamically. Depending on the sink’s destination, it helps to tune the write rate using a different RateLimitingStrategy.

This post explains how you can optimise sink throughput by configuring a custom RateLimitingStrategy on a connector that builds on the AsyncSinkBase (FLIP-171). In the sections below, we cover the design logic behind the AsyncSinkBase and the RateLimitingStrategy, then we take you through two example implementations of rate limiting strategies, specifically the CongestionControlRateLimitingStrategy and TokenBucketRateLimitingStrategy.

Read on for some tips on creating a rate limiting strategy for a sink.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Category: Streaming