I identify 5 types of workloads in Apache Kafka, and in my opinion each corresponds to a specific API:
Kafka Producer API: Applications directly producing data (ex: clickstream, logs, IoT).
Kafka Connect Source API: Applications bridging between a datastore we don’t control and Kafka (ex: CDC, Postgres, MongoDB, Twitter, REST API).
Kafka Streams API / KSQL: Applications wanting to consume from Kafka and produce back into Kafka, also called stream processing. Use KSQL if you think you can write your real-time job as SQL-like, use Kafka Streams API if you think you’re going to need to write complex logic for your job.
Kafka Consumer API: Read a stream and perform real-time actions on it (e.g. send email…)
Kafka Connect Sink API: Read a stream and store it into a target store (ex: Kafka to S3, Kafka to HDFS, Kafka to PostgreSQL, Kafka to MongoDB, etc.)
Stephane then goes into detail on each of these.