Stephane Maarek helps us make sense of when to use which Kafka API:
I identify 5 types of workloads in Apache Kafka, and in my opinion each corresponds to a specific API:
-
Kafka Producer API: Applications directly producing data (ex: clickstream, logs, IoT).
-
Kafka Connect Source API: Applications bridging between a datastore we don’t control and Kafka (ex: CDC, Postgres, MongoDB, Twitter, REST API).
-
Kafka Streams API / KSQL: Applications wanting to consume from Kafka and produce back into Kafka, also called stream processing. Use KSQL if you think you can write your real-time job as SQL-like, use Kafka Streams API if you think you’re going to need to write complex logic for your job.
-
Kafka Consumer API: Read a stream and perform real-time actions on it (e.g. send email…)
-
Kafka Connect Sink API: Read a stream and store it into a target store (ex: Kafka to S3, Kafka to HDFS, Kafka to PostgreSQL, Kafka to MongoDB, etc.)
Stephane then goes into detail on each of these.