Streaming ETL In Practice Using KSQL

Robin Moffatt builds an example of streaming ETL using Oracle, GoldenGate, and Kafka:

So in this post I’m going to show an example of what streaming ETL looks like in practice. I’m replacing batch extracts with event streams, and batch transformation with in-flight transformation of these event streams. We’ll take a stream of data from a transactional system built on Oracle, transform it, and stream it into Elasticsearch to land the results to, but your choice of datastore is up to you—with Kafka’s Connect API you can stream the data to almost anywhere! Using KSQL we’ll see how to filter streams of events in real-time from a database, how to join between events from two database tables, and how to create rolling aggregates on this data.

It’s a very useful example.

Related Posts

Multi-Region Replication with Confluent Platform

David Arthur walks us through multi-region replication of Kafka clusters in the Confluent Platform 5.4 preview: Running a single Apache Kafka® cluster across multiple datacenters (DCs) is a common, yet somewhat taboo architecture. This architecture, referred to as a stretch cluster, provides several operational benefits and unlocks the door to many uses cases. Stretch clusters provide […]

Read More

Diagnosing TCP SACKs-Related Slowdown in Databricks

Chris Stevens, et al, walk us through troubleshooting a slowdown after using Linux images which have been patched for the TCP SACKs vulnerabilities: In order to figure out why the straggler task took 15 minutes, we needed to catch it in the act. We reran the benchmark while monitoring the Spark UI, knowing that all […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728