Enriching Syslog Data In A Kafka Pipeline

Robin Moffatt continues his syslog processing series with Kafka and KSQL:

In this article we’re going to conclude our fun with syslog data by looking at how we can enrich inbound streams of syslog data with reference information from elsewhere to produce a real-time enriched data stream. The syslog data in this example comes from various servers and network devices, and the additional information with which we’re going to enrich it is from MongoDB, which happens to be the datastore used by Ubiquiti network devices. With the enriched data we’re going to drive some real-time analytics through Elasticsearch and Kibana, as well as trigger push notifications based on activity of certain devices on the network.

I’ve enjoyed this series—it was a full, end-to-end look at a realistic business problem in Kafka Streams.  If you want to get started with Kafka Streams, I’d be hard-pressed to find a better example.

Related Posts

Pivoting Spark DataFrames

Unmesha Sreeveni shows how we can pivot a DataFrame in Apache Spark using one line of code: A pivot can be thought of as translating rows into columns while applying one or more aggregations. Lets see how we can achieve the same using the above dataframe. We will pivot the data based on “Item” column. […]

Read More

Troubleshooting Spark Performance

Bikas Saha and Mridul Murlidharan explain some of the basics of performance tuning with Apache Spark: Our objective was to build a system that would provide an intuitive insight into Spark jobs that not just provides visibility but also codifies the best practices and deep experience we have gained after years of debugging and optimizing […]

Read More


June 2018
« May Jul »