Real-Time Data Streaming and Apache Kafka

Kai Waehner explains how Apache Kafka is not real-time:

Real-time data beats slow data. It is that easy! But what is real-time? The term always needs to be defined when discussing a use case. Apache Kafka is the de facto standard for real-time data streaming. Kafka is good enough for almost all real-time scenarios. But dedicated proprietary software is required for niche use cases. Kafka is NOT the right choice if you need microsecond latency! This article explores the architecture of NASDAQ that combines critical stock exchange trading with low-latency streaming analytics.

Kai uses the much more appropriate term “near real-time,” which I agree with. My mental example of “real-time” is software that you’d put on a fighter jet (which was an actual example in my undergrad days of a real-time operating system). If people potentially die because your software takes 4 milliseconds to do a job it needs to do in 100 microseconds, that’s real-time. For most of us, near real-time is certainly enough.

Actually, I’d go one step further: for most of us, not-really-real-time is fine. So many cases of “The users needs this data in real time!” boil down to “The users really only look at this once a day and couldn’t act on faster information and some of our data sources only update once a day.” Swap ‘once a day’ with ‘once an hour’ or something like that and you have the large majority of projects which started out with “near real-time” requirements.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Real-Time Data Streaming and Apache Kafka

One Comment