Using Kafka And Elasticsearch For IoT Data

Angelos Petheriotis talks about building an IoT structure which handles ten billion messages per day:

We splitted the pipeline into 2 main units: The aggregator job and the persisting job. The aggregator has one and only one responsibility. To read from the input kafka topic, process the messages and finally emit them to a new kafka topic. The persisting job then takes over and whenever a message is received from topic temperatures.aggregated it persists to elasticsearch.

The above approach might seem to be an overkill at first but it provides a lot of benefits (but also some drawbacks). Having two units means that each unit’s health won’t directly affect each other. If the processing job fails due OOM, the persisting job will still be healthy.

One major benefit we’ve seen using this approach is the replay capabilities this approach offers. For example, if at some point we need to persist the messages from temperatures.aggregated to Cassandra, it’s just a matter of wiring a new pipeline and start consuming the kafka topic. If we had one job for processing and persisting, we would have to reprocess every record from the, which comes with a great computational and time cost.

Angelos also discusses some issues he and his team had with Spark Streaming on this data set, so it’s an interesting comparison.

Related Posts

Event Sourcing On Kafka

Adam Warski shows how you can use Apache Kafka as your event sourcing data source: There’s a number of great introductory articles, so this is going to be a very brief introduction. With event sourcing, instead of storing the “current” state of the entities that are used in our system, we store a stream of events that relate to these […]

Read More

The Basics Of Kafka Security

Stephane Maarek has a nice post covering some of the basics of securing an Apache Kafka cluster: Once your Kafka clients are authenticated, Kafka needs to be able to decide what they can and cannot do. This is where Authorization comes in, controlled by Access Control Lists (ACL). ACL are what you expect them to be: […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *


March 2018
« Feb