Change Data Capture With Apache NiFi

Kevin Feasel

2016-09-15

ETL, Hadoop

Satish Bomma uses Apache NiFi to perform change data capture on a MySQL database:

The main things to configure is DBCPConnection Pool and Maximum-value Columns

Please choose this to be the date-time stamp column that could be a cumulative change-management column

This is the only limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination.

This processor does not rely on Transactional logs or redo logs like Attunity or Oracle Goldengate. For a complete solution for CDC please use Attunity or Oracle Goldengate solutions.

That last paragraph in the snippet is key:  it’s not a true replacement for CDC-friendly products.  It is, however, a good example for showing how to use NiFi to connect to a relational database and pump data out of it.

Related Posts

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Hortonworks Data Platform 3.0 Released

Saumitra Buragohain, et al, announce the newest version of the Hortonworks Data Platform: Highlighted Apache Hive features include: Workload management for LLAP:  You can assign resource pools within LLAP pool and allocate resources on a per user or per group basis. This enables support for large multi-tenant deployments. ACID v2 and ACID on by default:  We are […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930