Kalman Filters With Spark And Kafka

Konur Unyelioglu goes deep into Kalman filters:

In simple terms, a Kalman filter is a theoretical model to predict the state of a dynamic system under measurement noise. Originally developed in the 1960s, the Kalman filter has found applications in many different fields of technology including vehicle guidance and control, signal processing, transportation, analysis of economic data, and human health state monitoring, to name a few (see the Kalman filter Wikipedia page for a detailed discussion). A particular application area for the Kalman filter is signal estimation as part of time series analysis. Apache Spark provides a great framework to facilitate time series stream processing. As such, it would be useful to discuss how the Kalman filter can be combined with Apache Spark.

In this article, we will implement a Kalman filter for a simple dynamic model using the Apache Spark Structured Streaming engine and an Apache Kafka data source. We will use Apache Spark version 2.3.1 (latest, as of writing this article), Java version 1.8, and Kafka version 2.0.0. The article is organized as follows: the next section gives an overview of the dynamic model and the corresponding Kalman filter; the following section will discuss the application architecture and the corresponding deployment model, and in that section we will also review the Java code comprising different modules of the application; then, we will show graphically how the Kalman filter performs by comparing the predicted variables to measured variables under random measurement noise; we’ll wrap up the article by giving concluding remarks.

This is going on my “reread carefully” list; it’s very interesting and goes deep into the topic.

Related Posts

It’s All ETL (Or ELT) In The End

Robin Moffatt notes that ETL (and ELT) doesn’t go away in a streaming world: In the past we used ETL techniques purely within the data-warehousing and analytic space. But, if one considers why and what ETL is doing, it is actually a lot more applicable as a broader concept. Extract: Data is available from a source system Transform: We […]

Read More

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.


September 2018
« Aug