Kalman Filters With Spark And Kafka

Konur Unyelioglu goes deep into Kalman filters:

In simple terms, a Kalman filter is a theoretical model to predict the state of a dynamic system under measurement noise. Originally developed in the 1960s, the Kalman filter has found applications in many different fields of technology including vehicle guidance and control, signal processing, transportation, analysis of economic data, and human health state monitoring, to name a few (see the Kalman filter Wikipedia page for a detailed discussion). A particular application area for the Kalman filter is signal estimation as part of time series analysis. Apache Spark provides a great framework to facilitate time series stream processing. As such, it would be useful to discuss how the Kalman filter can be combined with Apache Spark.

In this article, we will implement a Kalman filter for a simple dynamic model using the Apache Spark Structured Streaming engine and an Apache Kafka data source. We will use Apache Spark version 2.3.1 (latest, as of writing this article), Java version 1.8, and Kafka version 2.0.0. The article is organized as follows: the next section gives an overview of the dynamic model and the corresponding Kalman filter; the following section will discuss the application architecture and the corresponding deployment model, and in that section we will also review the Java code comprising different modules of the application; then, we will show graphically how the Kalman filter performs by comparing the predicted variables to measured variables under random measurement noise; we’ll wrap up the article by giving concluding remarks.

This is going on my “reread carefully” list; it’s very interesting and goes deep into the topic.

Related Posts

Working with Columns in Spark

Achilleus has a two-parter on working with columns in Spark. Part 1 covers some of the basic syntax and several functions: Also, we can have typed columns which is basically a column with an expression encoder specified for the expected input and return type. scala> val name = $"name".as[String]name: org.apache.spark.sql.TypedColumn[Any,String] = namescala> val name = […]

Read More

Creating Threadpools with ExecutorService in Kafka

Prasanth Nair shows how we can use Java’s ExecutorService to create threadpools for Kafka consumers: Apache Kafka is one of today’s most commonly used event streaming platforms. While using the Kafka platform, quite often, we run into a scenario where we have to process a large number of events/messages that are placed on a broker. […]

Read More


September 2018
« Aug Oct »