Filtering On Kafka Streams

Robin Moffatt has a new series showing how to use Kafka Streams for dealing with syslog data:

syslog is one of those ubiquitous standards on which much of modern computing runs. Built into operating systems such as Linux, it’s also commonplace in networking and IoT devices like IP cameras. It provides a way for streaming log messages, along with metadata such as the source host, severity of the message, and so on. Sometimes the target is simply a local logfile, but more often it’s a centralised syslog server which in turn may log or process the messages further.

As a high-performance, distributed streaming platform, Apache Kafka® is a great tool for centralised ingestion of syslog data. Since Apache Kafka also persists data and supports native stream processing we don’t need to land it elsewhere before we can utilise the data. You can stream syslog data into Kafka in a variety of ways, including through Kafka Connect for which there is a dedicated syslog plugin.

In this post, we’re going to see how KSQL can be used to process syslog messages as they arrive in realtime.

Check it out.

Related Posts

From pandas to Spark with koalas

Achilleus tries out Koalas: Python is widely used programming language when it comes to Data science workloads and Python has way too many different libraries to back this fact. Most of the data scientists are familiar with Python and pandas mostly. But the main issue with Pandas is it works great for small and medium […]

Read More

Overriding Spark Dependencies

Landon Robinson shows how to override a Spark dependency located on the classpath: This doesn’t draw the line exactly where the method changed from private to public, but generally speaking:– gson-2.2.4.jar: the method is private, and therefore too old for use here– gson-2.6.1: the method is public, and works fine.– Somewhere between the two, the […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30