Streaming ETL In Practice Using KSQL

Robin Moffatt builds an example of streaming ETL using Oracle, GoldenGate, and Kafka:

So in this post I’m going to show an example of what streaming ETL looks like in practice. I’m replacing batch extracts with event streams, and batch transformation with in-flight transformation of these event streams. We’ll take a stream of data from a transactional system built on Oracle, transform it, and stream it into Elasticsearch to land the results to, but your choice of datastore is up to you—with Kafka’s Connect API you can stream the data to almost anywhere! Using KSQL we’ll see how to filter streams of events in real-time from a database, how to join between events from two database tables, and how to create rolling aggregates on this data.

It’s a very useful example.

Related Posts

Writing To Elasticsearch With Spark Streaming

Anuj Saxena has an example of writing data from a Spark Streaming pipeline out to Elasticsearch: There’s been a lot of time we have been working on streaming data. Using Apache Spark for that can be much convenient. Spark provides two APIs for streaming data one is Spark Streaming which is a separate library provided […]

Read More

Databricks Delta Now Available On Azure

Cihan Biyikoglu and Singh Garewal announce the availability of Databricks Delta on Azure Databricks: Using an innovative new table design, Delta supports both batch and streaming use cases with high query performance and strong data reliability while requiring a simpler data pipeline architecture: Increased query performance – Able to deliver 10 to 100 times faster performance […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728