Press "Enter" to skip to content

RDBMS To Hive Via Kafka

Rajesh Nadipalli shows how to use Kafka to read relational database data and feed it to Hive:

Processes that publish messages to a Kafka topic are called “producers.” “Topics” are feeds of messages in categories that Kafka maintains. The transactions from RDBMS will be converted to Kafka topics. For this example, let’s consider a database for a sales team from which transactions are published as Kafka topics. The following steps are required to set up the Kafka producer

I’d call this a non-trivial but still straightforward exercise.  Step 1 from the SQL Server side could be reading from transaction logs (which would be the least-intrusive), but you could also set up something like change tracking and fire off messages when important tables’ records change.