Real-Time Streaming ETL With Kafka Streams

Yeva Byzek has a tutorial using Kafka and Kafka Streams to perform real-time ETL:

Let’s consider an application that does some real-time stateful stream processing with the Kafka Streams API. We’ll run through a specific example of the end-to-end reference architecture and show you how to:

  • Run a Kafka source connector to read data from another system (a SQLite3 database), then modify the data in-flight using Single Message Transforms (SMTs) before writing it to the Kafka cluster

  • Process and enrich the data from a Java application using the Kafka Streams API (e.g. count and sum)

  • Run a Kafka sink connector to write data from the Kafka cluster to another system (AWS S3)

Read the whole thing.

Related Posts

How .NET Code Talks to Spark

Ed Elliott has a great diagram showing how user-written .NET code communicates with Spark over the Java VM: 4. Spark-dotnet Java driver listens on tcp portThe spark-dotnet Java driver listens on a TCP socket. This socket is used to communicate between the Java VM and the dotnet code, the dotnet code doesn’t run in the […]

Read More

Cloudera and 100% Open Source Software

Alex Woodie notes a change at Cloudera: The old Cloudera developed and distributed its Hadoop stack using a mix of open source and proprietary methods and licenses. But the new Cloudera will be 100% open source, just like Hortonworks, its one-time Hadoop rival that it acquired in January. But will developing its data platform completely […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930