Change Data Capture With Apache NiFi

Kevin Feasel

2016-09-15

ETL, Hadoop

Satish Bomma uses Apache NiFi to perform change data capture on a MySQL database:

The main things to configure is DBCPConnection Pool and Maximum-value Columns

Please choose this to be the date-time stamp column that could be a cumulative change-management column

This is the only limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination.

This processor does not rely on Transactional logs or redo logs like Attunity or Oracle Goldengate. For a complete solution for CDC please use Attunity or Oracle Goldengate solutions.

That last paragraph in the snippet is key:  it’s not a true replacement for CDC-friendly products.  It is, however, a good example for showing how to use NiFi to connect to a relational database and pump data out of it.

Related Posts

Security Improvements In Kafka And Confluent Platform

Vahid Fereydouny demonstrates a number of security improvements made to Apache Kafka 2.0 as well as Confluent Platform 5.0: Over the past several quarters, we have made major security enhancements to Confluent Platform, which have helped many of you safeguard your business-critical applications. With the latest release, we increased the robustness of our security feature […]

Read More

SparkSession Versus SparkContext

Abhishek Baranwal explains the differences between the SparkSession object and the SparkContext object when writing Spark code: Prior to spark 2.0, SparkContext was used as a channel to access all spark functionality. The spark driver program uses sparkContext to connect to the cluster through resource manager. SparkConf is required to create the spark context object, […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930