Press "Enter" to skip to content

Category: Streaming

Change Event Streaming in SQL Server 2025

Tomaz Kastrun continues an advent of SQL Server 2025. Day 20 takes a look at change event streaming:

Change event streaming (CES) is data integration capability that streams SQL Server data changes directly into Azure Event hubs. It captures and publishes incremental changes of data to an Azure Event Hubs destination in almost near real-time. Captured changes are insert, updates and deleted (DML) and are sent to Azure Event hubs as a serialized JSON (CloudEvent) and streamed to Azure event hub.

CES can be used for multiple different use-cases, like monitoring, auditing, event-driven system on top of your on-prem database with minimal overhead and changes to database, for synchronising data across systems (platforms, on-prem and cloud solutions, etc.) and many more.

Day 21 continues this look.

We have looked into the settings of SQL server and generating SAS token. And now we will need to set the needed Azure services.

Yes, it does cost extra money because of the Azure connection, but as long as you don’t have a mandate to be 100% on-premises, I think Change Event Streaming has the potential to be quite powerful for providing data between systems. This is exactly the sort of thing that Event Hubs (or other log-based systems similar to Apache Kafka) do quite well.

Leave a Comment

Stream or Batch Ordering with Apache Iceberg

Jack Vanlightly shows some tradeoffs:

Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences there can be a tug-of-war over the Iceberg table itself. In this post I am going to use two real-world systems, Apache Fluss (streaming tabular storage) and Confluent Tableflow (Kafka-to-Iceberg), as a case study for these tensions between stream and batch analytics.

Read on for a summary of how two opposite ideas can both be perfectly reasonable.

Comments closed

Scaling Kafka Streams Applications

The Confluent employee mines have a new article:

As the adoption of real-time data processing accelerates, the ability to scale stream processing applications to handle high-volume traffic is paramount. Apache Kafka®, the de facto standard for distributed event streaming, provides a powerful and scalable library in Kafka Streams for building such applications. 

Scaling a Kafka Streams application effectively involves a multi-faceted approach that encompasses architectural design, configuration tuning, and diligent monitoring. This guide will walk you through the essential strategies and best practices to ensure your Kafka Streams applications can gracefully handle massive throughput.

The post gets into some details around the kinds of limits you’ll hit during scaling, scale-up versus scale-out, and configuration settings to help with that scale.

Comments closed

Private Endpoints in Fabric Eventstream now GA

Alex Lin makes an announcement:

We’re excited to announce the General Availability of Managed Private Endpoints (MPE) in Fabric Eventstream. This network security feature allows you to stream data from Azure resources to Fabric over a private and secure network without the complexity of manual network configurations.

Read on to see what private endpoints give you and what’s new for general availability.

Comments closed

Event Streaming in Microsoft Fabric

Rayis Imayev streams some data:

In my post last week (https://datanrg.blogspot.com/2025/06/salesforce-cdc-data-integration.html), I talked about Salesforce Change Data Capture (CDC) event data streaming, where the initial event destination was file storage in Azure. But what if we anticipate a higher volume of incoming Salesforce source data or the addition of a new data feed? This could create the need for an alternative method of managing incoming events.

Read on to learn more.

Comments closed

Spark Streaming plus Drools

Ram Ghadiyaram builds a tool:

Near real-time decision-making systems are critical for modern business applications. Integrating Apache Spark (Streaming) and Drools provides scalability and flexibility, enabling efficient handling of rule-based decision-making at scale. This article showcases their integration through a loan approval system, demonstrating its architecture, implementation, and advantages.  

Click through for a bit of sample code.

Comments closed

Real-Time Data Streaming in Snowflake

Anil Kumar Moka streams some data:

Real-time data ingestion has become essential for modern analytics and operational intelligence. Organizations across industries need to process data streams from IoT sensors, financial transactions, and application events with minimal latency. Snowflake offers two robust approaches to meet these real-time data needs: Snowpipe for near-real-time file-based streaming and Direct Streaming via Snowpark API for true real-time data integration.

This guide explores both options in depth, providing detailed implementations with explanation of code parameters, performance comparisons, and practical recommendations to help you choose the right approach for your specific use case.

Click through to see how it works. I’ll only make one semi-snarky comment that ‘real-time’ doesn’t mean “takes several seconds” but I realize I’m the one tilting at windmills here.

Comments closed

Building a Multi-Agent Orchestrator with Flink and Kafka

Sean Falconer builds an orchestration engine:

Just as some problems are too big for one person to solve, some tasks are too complex for a single artificial intelligence (AI) agent to handle. Instead, the best approach is to decompose problems into smaller, specialized units so that multiple agents can work together as a team.

This is the foundation of a multi-agent system—networks of agents, each with a specific role, collaborating to solve larger problems.

Read on for the overview. There’s also a code repository and a free e-book on the topic.

Comments closed

Troubleshooting an Apache Flink Job Not Producing Results

Wade Waldron digs in:

Imagine that you have built an Apache Flink® job. It collects records from Apache Kafka®, performs a time-based aggregation on those records, and emits a new record to a different topic. With your excitement high, you run the job for the first time, and are disappointed to discover that nothing happens. You check the input topic and see the data flowing, but when you look at the output topic, it’s empty.

In many cases, this is an indication that there is a problem with watermarks. But what is a watermark?

Read on for a primer on watermarks, followed by an explanation of the common solution to the problem Wade describes.

Comments closed