Category: Streaming

Unit Testing Kafka Streams

Published 2017-08-07 by Kevin Feasel

Anuj Saxena shows us how to build mocks for streams in Kafka Streams:

Here, we are using Kafka streams in our applications. We are done with the implementation but again, the most important thing left is testing. This blog is about how to test the application we have created. For this, I’ll be taking the sample app I created in my previous blog for both high-level DSL and low-level processor API.

Traditionally, we test our Kafka application with an integration test for which we need to create a ZooKeeper and a real Kafka broker. After that, we need a mock producer and mock consumer for our app to produce the inputs and receive the outputs. That makes it such a big hassle just to test our app. Testing it for real scenarios and for the actual integration test, this is needed without a doubt.

Click through for an example.

Comments closed

Stream Analytics Into Power BI

Published 2017-08-02 by Kevin Feasel

Rolf Tesmer shows off how to use Azure Stream Analytics to push data in real time via the Power BI API into your Power BI dashboard:

You can push data to the Power BI streaming dataset API in a few ways… but they generally boil down to these 3 options…

Directly call the API from code

You could use something like Azure Function Apps to iteratively pull NEW rows that land in a SQL table, create the API Call, and push the new data directly to the API

See here info on Azure Functions – https://azure.microsoft.com/en-us/services/functions/

Directly call the API from an Azure Logic App

Azure Logic Apps are cool as for simple functions like this you can do pretty much the same as in the code option above but just using drag/drop and WITHOUT writing any code – https://azure.microsoft.com/en-us/services/logic-apps/

Use Azure Stream Analytics to push data into the API

This is leveraging the solution in my previous post to push data from SQL CDC and into an Azure Event Hub, then via Azure Stream Analytics to Power BI

Azure Stream Analytics – https://azure.microsoft.com/en-us/services/stream-analytics/

Azure Event Hubs – https://azure.microsoft.com/en-us/services/event-hubs/

This blog post extends on my previous post – and thus I will be leveraging Option #3 above.

Definitely worth checking out if you are interested in real-time Power BI dashboards.

Comments closed

Kafka Streams Basics

Published 2017-07-25 by Kevin Feasel

Anuj Saxena walks through Kafka Streams and provides a quick example:

The features provided by Kafka Streams:

Highly scalable, elastic, distributed, and fault-tolerant application.
Stateful and stateless processing.
Event-time processing with windowing, joins, and aggregations.
We can use the already-defined most common transformation operation using Kafka Streams DSL or the lower-level processor API, which allow us to define and connect custom processors.
Low barrier to entry, which means it does not take much configuration and setup to run a small scale trial of stream processing; the rest depends on your use case.
No separate cluster requirements for processing (integrated with Kafka).
Employs one-record-at-a-time processing to achieve millisecond processing latency, and supports event-time based windowing operations with the late arrival of records.
Supports Kafka Connect to connect to different applications and databases.

Read on for more details as well as a sample script to get started.

Comments closed

Streaming ETL Using CDC And Event Hub

Published 2017-07-12 by Kevin Feasel

Rolf Tesmer combines Change Data Capture and Event Hubs to build a streaming ETL solution:

The solution picks up the SQL data changes from the CDC Change Tracking system tables, creates JSON messages from the change rows, and then posts the message to an Azure Event Hub. Once landed in the Event Hub an Azure Stream Analytics (ASA) Job distributes the changes into the multiple outputs.

What I found pretty cool was that I could transmit SQL delta changes from source to target in as little as 5 seconds end to end!

There are a bunch of steps, but the end result is worth it.

Comments closed

Real-Time Streaming ETL With Kafka Streams

Published 2017-06-27 by Kevin Feasel

Yeva Byzek has a tutorial using Kafka and Kafka Streams to perform real-time ETL:

Let’s consider an application that does some real-time stateful stream processing with the Kafka Streams API. We’ll run through a specific example of the end-to-end reference architecture and show you how to:

Run a Kafka source connector to read data from another system (a SQLite3 database), then modify the data in-flight using Single Message Transforms (SMTs) before writing it to the Kafka cluster
Process and enrich the data from a Java application using the Kafka Streams API (e.g. count and sum)
Run a Kafka sink connector to write data from the Kafka cluster to another system (AWS S3)

Read the whole thing.

Comments closed

Updates In Apache Kafka

Published 2017-06-23 by Kevin Feasel

Yeva Byzek announces that Apache Kafka 0.11.0.0 is shipping soon:

We are very excited for the GA for Kafka release 0.11.0.0 which is just days away. This release is bringing many new features as described in the previous Log Compaction blog post.

The most notable new feature is Exactly Once Semantics (EOS). Kafka’s EOS capabilities provide more stringent idempotent producer semantics with exactly once, in-order delivery per partition, and stronger transactional guarantees with atomic writes across multiple partitions. Together, these strong semantics make writing applications easier and expand Kafka’s addressable use cases. You can learn more about EOS in the online talk on June 29, 2017.

“Exactly once,” if done right, would be crazy—there’s a reason most brokers are either “at least once” or “best effort.”

Comments closed

Spark Streaming Vs Kafka Streams

Published 2017-06-21 by Kevin Feasel

Mahesh Chand Kandpal contrasts Kafka Streams with Spark Streaming:

The low latency and an easy-to-use event time support also apply to Kafka Streams. It is a rather focused library, and it’s very well-suited for certain types of tasks. That’s also why some of its design can be so optimized for how Kafka works. You don’t need to set up any kind of special Kafka Streams cluster, and there is no cluster manager. And if you need to do a simple Kafka topic-to-topic transformation, count elements by key, enrich a stream with data from another topic, or run an aggregation or only real-time processing — Kafka Streams is for you.

If event time is not relevant and latencies in the seconds range are acceptable, Spark is the first choice. It is stable and almost any type of system can be easily integrated. In addition it comes with every Hadoop distribution. Furthermore, the code used for batch applications can also be used for the streaming applications as the API is the same.

Read on for more analysis.

Comments closed

Kinesis Data Generation

Published 2017-05-19 by Kevin Feasel

Allan MacInnis shows off a new data generation tool for Amazon’s Kinesis:

Amazon Kinesis Streams and Amazon Kinesis Firehose enable you to continuously capture and store terabytes of data per hour from hundreds of thousands of sources. Amazon Kinesis Analytics gives you the ability to use standard SQL to analyze and aggregate this data in real-time. It’s easy to create an Amazon Kinesis stream or Firehose delivery stream with just a few clicks in the AWS Management Console (or a few commands using the AWS CLI or Amazon Kinesis API). However, to generate a continuous stream of test data, you must write a custom process or script that runs continuously, using the AWS SDK or CLI to send test records to Amazon Kinesis. Although this task is necessary to adequately test your solution, it means more complexity and longer development and testing times.

Wouldn’t it be great if there were a user-friendly tool to generate test data and send it to Amazon Kinesis? Well, now there is—the Amazon Kinesis Data Generator (KDG).

Check it out if you’re using Kinesis and need to do some load testing.

Comments closed

Storm In .Net

Published 2017-05-05 by Kevin Feasel

Ravi Peri explains how to use Apache Storm in .NET code on HDInsight:

Topology submissions can fail due to many reasons:

JDK is not installed or is not in the Path

Required java dependencies are not included

Incompatible java jar dependencies. Example: Storm-eventhub-spouts-9.jar is incompatible with Storm 1.0.1. If you submit a jar with that dependency, topolopgy submission will fail.

Duplicate names for topologies

/var/log/hdinsight-scpwebapi/hdinsight-scpwebapi.out file on active headnode will contain the error details.

At one point, I was big on Storm and really wanted a .NET client for Storm to take off. Nowadays, I’d rather use Spark Streaming or Kafka Streams for the same kind of streaming data work.

Comments closed

Kafka + Spark Streaming

Published 2017-05-02 by Kevin Feasel

Kunal Khamar, et al, show how to integrate Apache Kafka with Spark’s structured streaming:

Kafka is a distributed pub-sub messaging system that is popular for ingesting real-time data streams and making them available to downstream consumers in a parallel and fault-tolerant manner. This renders Kafka suitable for building real-time streaming data pipelines that reliably move data between heterogeneous processing systems. Before we dive into the details of Structured Streaming’s Kafka support, let’s recap some basic concepts and terms.

Data in Kafka is organized into topics that are split into partitions for parallelism. Each partition is an ordered, immutable sequence of records, and can be thought of as a structured commit log. Producers append records to the tail of these logs and consumers read the logs at their own pace. Multiple consumers can subscribe to a topic and receive incoming records as they arrive. As new records arrive to a partition in a Kafka topic, they are assigned a sequential id number called the offset. A Kafka cluster retains all published records—whether or not they have been consumed—for a configurable retention period, after which they are marked for deletion.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28