.NET Producer For Kafka

Kevin Feasel



I build a simple .NET console app to push messages to a Kafka topic:

That’s the core of our code.  The main function instantiates a new Kafka producer and gloms onto the Flights topic.  From there, we call the loadEntries function.  The loadEntries function takes a topic and filename.  It streams entries from the 2008.csv file and uses the ParallelSeq library to operate in parallel on data streaming in (one of the nice advantages of using functional code:  writing thread-safe code is easy!).  We filter out any records whose length is zero—there might be newlines somewhere in the file, and those aren’t helpful.  We also want to throw away the header row (if it exists) and I know that that starts with “Year” whereas all other records simply include the numeric year value.  Finally, once we throw away garbage rows, we want to call the publish function for each entry in the list.  The publish function encodes our text as a UTF-8 bytestream and pushes the results onto our Kafka topic.

All this plus a bonus F# pitch.

Related Posts

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

ElasticMapReduce And RStudio

Tanzir Musabbir demonstrates how to set up Amazon ElasticMapReduce to include an RStudio edge node: RStudio Server provides a browser-based interface for R and a popular tool among data scientists. Data scientist use Apache Spark cluster running on  Amazon EMR to perform distributed training. In a previous blog post, the author showed how you can install RStudio Server on Amazon […]

Read More


October 2016
« Sep Nov »