Log Aggregation With Kafka And Redis

Asaf Yigal has a two-part series on comparing Apache Kafka and Redis for moving log events into Elasticsearch.  Part 1 explains the technologies:

Redis is a bit different from Kafka in terms of its storage and various functionalities. At its core, Redis is an in-memory data store that can be used as a high-performance database, a cache, and a message broker. It is perfect for real-time data processing.

The various data structures supported by Redis are strings, hashes, lists, sets, and sorted sets. Redis also has various clients written in several languages which can be used to write custom programs for the insertion and retrieval of data. This is an advantage over Kafka since Kafka only has a Java client. The main similarity between the two is that they both provide a messaging service. But for the purpose of log aggregation, we can use Redis’ various data structures to do it more efficiently.

Part 2 compares the two technologies and explains which works better when:

Kafka heavily relies on the machine memory (RAM). As we see in the previous graph, utilizing the memory and storage is an optimal way to maintain a steady throughput. Its performance depends on the data consumption rate. In the case that consumers don’t consume data fast enough, Kafka will have to read from a disk and not from memory which will slow down its performance.

As you might expect, the answer for which technology to use is “it depends.”

Related Posts

Kafka 2.3 and Kafka Connect Improvements

Robin Moffatt goes over improvements in Kafka Connect with the release of Apache Kafka 2.3: A Kafka Connect cluster is made up of one or more worker processes, and the cluster distributes the work of connectors as tasks. When a connector or worker is added or removed, Kafka Connect will attempt to rebalance these tasks. Before version 2.3 of Kafka, […]

Read More

The Databricks File System

Brad Llewellyn takes us through the Azure Databricks File System: Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks.  If you haven’t read the previous posts in this series, Introduction, Cluster Creation and Notebooks, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on […]

Read More


December 2016
« Nov Jan »