Streaming – Page 18 – Curated SQL

A cleaner way is to provide the service with a separate stream that contains only the relevant subset of events that the microservice cares about. To achieve this, a streaming application can branch the original event stream into different substreams using the method KStream#branch(). This results in new Kafka topics, so then the microservice can subscribe to one of the branched streams directly.
For example, in the finance domain, consider a fraud remediation microservice that should process only the subset of events suspected of being fraudulent. As shown below, the original stream of events is branched into two new streams: one for suspicious events and one for validated events. This enables the fraud remediation microservice to process just the stream of suspicious events, without ever seeing the validated events.

Read on to learn more.

Comments closed

Confluent Platform 5.2 Released

Published 2019-04-04 by Kevin Feasel

Mau Barra announces Confluent Platform 5.2:

Confluent Platform 5.2 represents a significant milestone in our efforts across three key dimensions:
1. It allows you to use the entire Confluent Platform free forever in single-broker Kafka clusters, so you are freer than ever to start building new event streaming applications right away. We are also bringing librdkafka 1.0 in order to bring our C/C++, Python, Go and .NET clients closer to parity with the Java client.
2. It adds critical enhancements to Confluent Control Center that will help you meet your event streaming SLAs in distributed Apache Kafka environments at greater scale.
3. With our latest version of Confluent Replicator, you can now seamlessly stream events across on-prem and public cloud deployments.

The top item is quite interesting: a free developer license and not just a 30-day trial.

Comments closed

Kafka Streams: Streams and Tables

Published 2019-04-02 by Kevin Feasel

Neha Bhardwaj explains a couple of the key abstractions in Kafka Streams:

In this blog, we’ll move one step forward to get an understanding of the Dual streaming model to see what abstractions does KSQL use to process the data.
All the data that we are working on with KSQL is produced to Kafka topics by some client. This client can be any Application, Kafka connectors etc., which produces continuous never-ending data to the topics.
KSQL does not directly interact with these topics, it rather introduces a couple of abstractions in between to process the data, which are known as Streams and Tables.

Read on to learn what these are and why it’s useful to think in these terms.

Comments closed

Request: Add Support for Row Pattern Recognition

Published 2019-04-02 by Kevin Feasel

Itzik Ben-Gan would like to see Row Pattern Recognition make it into T-SQL:

The ISO/IEC 9075:2016 standard (aka SQL:2016) introduces support for Row Pattern Recognition (RPR) in SQL. Similar to using regular expressions to identify patterns in a string, RPR allows you to use regular expressions to identify patterns in a sequence of rows.
To me, it’s the next step in the evolution of window functions. If you think that window functions are profound and useful, RPR is really going to bake your noodle.
RPR has limitless practical applications, including identifying patterns in stock market activity, handling time series, fraud detection, material handling, shipping applications, DNA sequencing, gaps and islands, top N per group, and many others.

I’ve voted it up and recommend you do so too. This is a great way to think of streams of data sitting in a database. If you have business use cases where this could help, adding those as comments would be great too.

Comments closed

Flink and Stateful Streaming

Published 2019-03-25 by Kevin Feasel

Himanshu Gupta explains some of the benefits Apache Flink offers for stateful streaming applicatons:

The 2 main types of stream processing done are:
1. Stateless: Where every event is handled completely independent from the preceding events.
2. Stateful: Where a “state” is shared between events and therefore past events can influence the way current events are processed.
Stateless stream processing is easy to scale up because events are processed independently. But Stateful stream processing is difficult to scale up because the “state” needs to be shared across the events.

Himanshu does point out alternatives, but this isn’t a comparison exercise.

Comments closed

Building a Power BI Dashboard on Streaming Data

Published 2019-03-25 by Kevin Feasel

Annie Xu shows us how to build a Power BI dashboard on a streaming data source in Azure:

This post is about something new I have tried last week. The goal was to create simulated streaming data source, feed it into Power BI as a streaming dataset, create a report out of the streaming dataset, and then embed it to an web application. With proper directions provided by my teammates, I finished the implementation from end to end within 1.5 hours. I was super impressed by how awesome it is and how easy it is to implement so that I want to share those directions to you.

The source data is simulated but the process is the same with real data sets.

Comments closed

Using the StreamSets Snowflake Destination

Published 2019-03-19 by Kevin Feasel

Dash Desai shows how you can use StreamSets to write data into SnowflakeDB:

In particular, we’ll look at an example scenario that addresses Data Drift – where new information is added mid-stream and when that occurs the new table structure and new column values are created in Snowflake automatically.
To illustrate, let’s take HTTP web server logs generated by Apache web server (for example) as our main source of data. Here’s what a typical log line looks like:
150.47.54.136 - - [14/Jun/2014:10:30:19 -0400] "GET /department/outdoors/category/kids'%20golf%20clubs/product/Polar%20Loop%20Activity%20Tracker HTTP/1.1" 200 1026 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36"

Click through for the demonstration.

Comments closed

Database-First or Kafka-First for Event Streaming

Published 2019-03-19 by Kevin Feasel

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense:

Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more.
If you were to go Kafka-first… well, it isn’t impossible. But all those responsibilities now belong to you as a developer. And if you are thinking there may be multiple webservers handling user requests and passing them to Kafka, you have to solve fairly challenging problems.

Read the whole thing.

Comments closed

Monitoring Kafka Streams with JMX Metrics

Published 2019-03-07 by Kevin Feasel

Rishi Khandelwal provides a reference architecture for monitoring a Kafka Streams application using JMX Metrics and pushing the results into Graphite:

Service (application) exposes the JMX metrics at some port which will be captured by Jolokia java agent. Then Jolokia exposes those metrics at some port which is easily accessible through a rest endpoint (we call it Jolokia URL). Then we have JMX2Graphte which polls the metrics from Jolokia URL and push it to Graphite. Then Grafana reads the Graphite metrics and creates a beautiful dashboard for us along with the alerts.
So this is the working of the proposed monitoring solution. Now let’s discuss the components of the monitoring solution.

There’s a bit of code/configuration in here as well, so check it out.

Comments closed

Creating a UDF in KSQL

Published 2019-03-01 by Kevin Feasel

Saeed Barghi shows us how we can create user-defined functions in Kafka Streams’ KSQL language:

The UDF I want to implement here is DATEADD. If you’re familiar with SQL, you have definitely used it: it takes in a date and adds or subtracts a specific number value to a specific part of datetime, and spits out a new datetime.
To implement a User Defined Function (UDF or UDAF) you would need to code your function in Java and then import the jar file in your KSQL server. You can read about the full process here, I point out a couple of things that I believe you should pay attention to:

I think I prefer Spark’s method for UDFs in Spark SQL: create it in Scala and register the function.

Comments closed

Category: Streaming

Dynamic Routing with Kafka Streams

Confluent Platform 5.2 Released

Kafka Streams: Streams and Tables

Request: Add Support for Row Pattern Recognition

Flink and Stateful Streaming

Building a Power BI Dashboard on Streaming Data

Using the StreamSets Snowflake Destination

Database-First or Kafka-First for Event Streaming

Monitoring Kafka Streams with JMX Metrics

Creating a UDF in KSQL