Building A Prediction Engine

Richard Williamson explains how to build a prediction engine using technologies such as Spark, Kudu, Impala, and Kafka:

We’ll aim to predict the volume of events for the next 10 minutes using a streaming regression model, and compare those results to a traditional batch prediction method. This prediction could then be used to dynamically scale compute resources, or for other business optimization. I will start out by describing how you would do the prediction through traditional batch processing methods using both Apache Impala (incubating) and Apache Spark, and then finish by showing how to more dynamically predict usage by using Spark Streaming.

Of course, the starting point for any prediction is a freshly updated data feed for the historic volume for which I want to forecast future volume. In this case, I discovered that Meetup.com has a very nice data feed that can be used for demonstration purposes. You can read more about the API here, but all you need to know at this point is that it provides a steady stream of RSVP volume that we can use to predict future RSVP volume.

This is pretty dense, but it is a great look at one potential architecture leveraging Spark and several tools in the Hadoop ecosystem.

Related Posts

Databricks Runtime 5.2 Released

Nakul Jamadagni announces Databricks Runtime 5.2: Delta Time TravelTime Travel, released as an Experimental feature, adds the ability to query a snapshot of a table using a timestamp string or a version, using SQL syntax as well as DataFrameReader options for timestamp expressions.Sample codeSELECT count() FROM events TIMESTAMP AS OF timestamp_expressionSELECT count() FROM events VERSION AS OF version Time travel looks a bit like temporal tables in SQL Server.

Read More

Kafka And The Differing Aims Of Data Professionals

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers: Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031