Apache describes Kafka as a distributed streaming platform that lets us:
Publish and subscribe to streams of records.
Store streams of records in a fault-tolerant way.
Process streams of records as they occur.
Kafka is probably the most generally interesting of the current Hadoop ecosystem, with Spark not too far behind. By “generally interesting,” I mean in the sense that companies with no vested interest in Hadoop as a whole could still be excited by the prospect of Kafka.