Prashant Sharma explains the basics of Apache Kafka:
Apache describes Kafka as a distributed streaming platform that lets us:
-
Publish and subscribe to streams of records.
-
Store streams of records in a fault-tolerant way.
-
Process streams of records as they occur.
Kafka is probably the most generally interesting of the current Hadoop ecosystem, with Spark not too far behind. By “generally interesting,” I mean in the sense that companies with no vested interest in Hadoop as a whole could still be excited by the prospect of Kafka.