Streaming – Curated SQL

Confluent Schema Registry Support in Fabric Eventstreams

Published 2025-07-18 by Kevin Feasel

Xu Jiang announces a new preview feature:

We are pleased to announce that Eventstream’s Confluent Cloud for Apache Kafka streaming connector now supports decoding data from Confluent Cloud for Apache Kafka topics that are associated with a data contract in Confluent Schema Registry.

Click through to see how this currently works.

Private Endpoints in Fabric Eventstream now GA

Published 2025-07-17 by Kevin Feasel

Alex Lin makes an announcement:

We’re excited to announce the General Availability of Managed Private Endpoints (MPE) in Fabric Eventstream. This network security feature allows you to stream data from Azure resources to Fabric over a private and secure network without the complexity of manual network configurations.

Read on to see what private endpoints give you and what’s new for general availability.

Comments closed

Event Streaming in Microsoft Fabric

Published 2025-07-08 by Kevin Feasel

Rayis Imayev streams some data:

In my post last week (https://datanrg.blogspot.com/2025/06/salesforce-cdc-data-integration.html), I talked about Salesforce Change Data Capture (CDC) event data streaming, where the initial event destination was file storage in Azure. But what if we anticipate a higher volume of incoming Salesforce source data or the addition of a new data feed? This could create the need for an alternative method of managing incoming events.

Read on to learn more.

Comments closed

Spark Streaming plus Drools

Published 2025-06-11 by Kevin Feasel

Ram Ghadiyaram builds a tool:

Near real-time decision-making systems are critical for modern business applications. Integrating Apache Spark (Streaming) and Drools provides scalability and flexibility, enabling efficient handling of rule-based decision-making at scale. This article showcases their integration through a loan approval system, demonstrating its architecture, implementation, and advantages.

Click through for a bit of sample code.

Comments closed

Real-Time Data Streaming in Snowflake

Published 2025-05-09 by Kevin Feasel

Anil Kumar Moka streams some data:

Real-time data ingestion has become essential for modern analytics and operational intelligence. Organizations across industries need to process data streams from IoT sensors, financial transactions, and application events with minimal latency. Snowflake offers two robust approaches to meet these real-time data needs: Snowpipe for near-real-time file-based streaming and Direct Streaming via Snowpark API for true real-time data integration.

This guide explores both options in depth, providing detailed implementations with explanation of code parameters, performance comparisons, and practical recommendations to help you choose the right approach for your specific use case.

Click through to see how it works. I’ll only make one semi-snarky comment that ‘real-time’ doesn’t mean “takes several seconds” but I realize I’m the one tilting at windmills here.

Comments closed

Building a Multi-Agent Orchestrator with Flink and Kafka

Published 2025-05-05 by Kevin Feasel

Sean Falconer builds an orchestration engine:

Just as some problems are too big for one person to solve, some tasks are too complex for a single artificial intelligence (AI) agent to handle. Instead, the best approach is to decompose problems into smaller, specialized units so that multiple agents can work together as a team.

This is the foundation of a multi-agent system—networks of agents, each with a specific role, collaborating to solve larger problems.

Read on for the overview. There’s also a code repository and a free e-book on the topic.

Comments closed

Troubleshooting an Apache Flink Job Not Producing Results

Published 2025-04-18 by Kevin Feasel

Wade Waldron digs in:

Imagine that you have built an Apache Flink® job. It collects records from Apache Kafka®, performs a time-based aggregation on those records, and emits a new record to a different topic. With your excitement high, you run the job for the first time, and are disappointed to discover that nothing happens. You check the input topic and see the data flowing, but when you look at the output topic, it’s empty.

In many cases, this is an indication that there is a problem with watermarks. But what is a watermark?

Read on for a primer on watermarks, followed by an explanation of the common solution to the problem Wade describes.

Comments closed

Handling Errors in Apache Flink Apps

Published 2025-02-07 by Kevin Feasel

Alexis Tekin and Jeremy Ber handle an error:

Data streaming applications continuously process incoming data, much like a never-ending query against a database. Unlike traditional database queries where you request data one time and receive a single response, streaming data applications constantly receive new data in real time. This introduces some complexity, particularly around error handling. This post discusses the strategies for handling errors in Apache Flink applications. However, the general principles discussed here apply to stream processing applications at large.

Read on to see how this all works when you’re hosting a Flink application. This directly relates to Flink applications that live in AWS, though very little in the article is AWS-specific.

Comments closed

Streaming Data to Azure Event Hub via Mockaroo and Kafka API

Published 2025-01-21 by Kevin Feasel

Jasleen Kaur Wahi generates some data:

In a recent project, I faced the need to generate randomized data for transmission to the Azure Event Hub. This hub is a key component of Microsoft Azure, used for real-time data ingestion and processing.

First, let’s take look at how I created this random data. I wanted to come up with a way to make data that looks like what we see in the real world, but without using any real information from users. This made-up data was really important for a bunch of things, like checking if our software works well.

Read on to see how Mockaroo works and the end result. Creating tests for streaming services like Event Hubs is a challenge, so this is an interesting approach to the task.

Comments closed

The Data Streaming Landscape Entering 2025

Published 2024-12-11 by Kevin Feasel

Kai Waehner lays out the state of things:

Data streaming is a new software category. It has grown from niche adoption to becoming a fundamental part of modern data architecture, leveraging open source technologies like Apache Kafka and Flink. With real-time data processing transforming industries, the ecosystem of tools, platforms, and cloud services has evolved significantly. This blog post explores the data streaming landscape of 2025, analyzing key players, trends, and market dynamics shaping this space.

It’s always important to keep the writer’s bias in mind when reading these articles (and we all have biases, whether or not we admit to them). With that preparatory throat-clearing out of the way, Kai does an excellent job laying out the players, the criteria he uses for analysis, and the current state of the field.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Category: Streaming