Real-Time Streaming ETL With Kafka Streams

Yeva Byzek has a tutorial using Kafka and Kafka Streams to perform real-time ETL:

Let’s consider an application that does some real-time stateful stream processing with the Kafka Streams API. We’ll run through a specific example of the end-to-end reference architecture and show you how to:

  • Run a Kafka source connector to read data from another system (a SQLite3 database), then modify the data in-flight using Single Message Transforms (SMTs) before writing it to the Kafka cluster

  • Process and enrich the data from a Java application using the Kafka Streams API (e.g. count and sum)

  • Run a Kafka sink connector to write data from the Kafka cluster to another system (AWS S3)

Read the whole thing.

Related Posts

Keep That Data Raw

Archana Madhavan argues that you should retain your raw data: When your pipeline already has to read every line of your data, it’s tempting to make it perform some fancy transformations. But you should steer clear of these add-ons so that you: Avoid flawed calculations. If you have thousands of machines running your pipeline in real-time, […]

Read More

Long-Term Storage In Kafka

Jay Kreps shows us that you can use Kafka as a primary data store: The short answer is that it’s not insane, people do this all the time, and Kafka was actually designed for this type of usage. But first, why might you want to do this? There are actually a number of use cases, […]

Read More


June 2017
« May Jul »