Kafka For Pythonistas

Matt Howlett has an introduction to Apache Kafka, designed for Python developers:

In the call to the produce method, both the key and value parameters need to be either a byte-like object (in Python 2.x this includes strings), a Unicode object, or None. In Python 3.x, strings are Unicode and will be converted to a sequence of bytes using the UTF-8 encoding. In Python 2.x, objects of type unicode will be encoded using the default encoding. Often, you will want to serialize objects of a particular type before writing them to Kafka. A common pattern for doing this is to subclass Producer and override the produce method with one that performs the required serialization.

The produce method returns immediately without waiting for confirmation that the message has been successfully produced to Kafka (or otherwise). The flush method blocks until all outstanding produce commands have completed, or the optional timeout (specified as a number of seconds) has been exceeded. You can test to see whether all produce commands have completed by checking the value returned by the flush method: if it is greater than zero, there are still produce commands that have yet to complete. Note that you should typically call flush only at application teardown, not during normal flow of execution, as it will prevent requests from being streamlined in a performant manner.

This is a fairly gentle introduction to the topic if you’re already familiar with Python and have a familiarity with message broker systems.

Related Posts

Stream-To-Stream Joins In Spark

Ayush Tiwari shows how to join a pair of streams in Apache Spark 2.3: In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to see how beautifully spark now give support for joining the two streaming dataframes. I this example, I […]

Read More

Spark: DataFrame To RDD For Data Cleansing

Gilad Moscovitch walks us through a common data cleansing problem with Spark data frames: A problem can arise when one of the inner fields of the json,┬áhas undesired non-json values in some of the records. For instance, an inner field might contains HTTP errors, that would be interpreted as a string, rather than as a […]

Read More


June 2017
« May Jul »