Real-Time Streaming ETL With Kafka Streams

Yeva Byzek has a tutorial using Kafka and Kafka Streams to perform real-time ETL:

Let’s consider an application that does some real-time stateful stream processing with the Kafka Streams API. We’ll run through a specific example of the end-to-end reference architecture and show you how to:

  • Run a Kafka source connector to read data from another system (a SQLite3 database), then modify the data in-flight using Single Message Transforms (SMTs) before writing it to the Kafka cluster

  • Process and enrich the data from a Java application using the Kafka Streams API (e.g. count and sum)

  • Run a Kafka sink connector to write data from the Kafka cluster to another system (AWS S3)

Read the whole thing.

Related Posts

Temporal Tables with Flink

Marta Paes shows off a new feature in Apache Flink: In the 1.7 release, Flink has introduced the concept of temporal tables into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and […]

Read More

Azure Data Factory: Mapping and Wrangling Data Flows

Cathrine Wilhelmsen explains the difference between Mapping Data Flows and Wrangling Data Flows in Azure Data Factory: Now, we all know that the consultant answer to “which should I use?” is It Depends ™ 🙂 But what does it depend on? To me, it boils down to a few key questions you need to ask:– What is […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930