Unit Testing Of Spark Streaming

Felipe Fernandez shows how to unit test Spark Streaming:

Controlling the lifecycle of Spark can be cumbersome and tedious. Fortunately, Spark Testing Baseproject offers us Scala Traits that handle those low-level details for us. Streaming has an extra bit of complexity as we need to produce data for ingestion in a timely way. At the same time, Spark internal clock needs to tick in a controlled way if we want to test timed operations as sliding windows.

This is part one of a series.  I’m interesting in seeing where this goes.

Related Posts

Working With Key-Value Pairs In Spark

Teena Vashist shows us a few of the functions available with Spark for working with key-value pairs: 1. Creating Key/Value Pair RDD:  The pair RDD arranges the data of a row into two parts. The first part is the Key and the second part is the Value. In the below example, I used a parallelize method to create a RDD, […]

Read More

Load Testing Spark To MongoDB

Abdelghani Tassi has a quick load test to see how fast Spark can load data into MongoDB: Recently, my company faced the serious challenge of loading a 10 million rows of CSV-formatted geographic data to MongoDB in real-time. We first tried to make a simple Python script to load CSV files in memory and send […]

Read More


August 2016
« Jul Sep »