Unit Testing Of Spark Streaming

Felipe Fernandez shows how to unit test Spark Streaming:

Controlling the lifecycle of Spark can be cumbersome and tedious. Fortunately, Spark Testing Baseproject offers us Scala Traits that handle those low-level details for us. Streaming has an extra bit of complexity as we need to produce data for ingestion in a timely way. At the same time, Spark internal clock needs to tick in a controlled way if we want to test timed operations as sliding windows.

This is part one of a series.  I’m interesting in seeing where this goes.

Related Posts

Stream-To-Stream Joins In Spark

Ayush Tiwari shows how to join a pair of streams in Apache Spark 2.3: In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to see how beautifully spark now give support for joining the two streaming dataframes. I this example, I […]

Read More

Spark: DataFrame To RDD For Data Cleansing

Gilad Moscovitch walks us through a common data cleansing problem with Spark data frames: A problem can arise when one of the inner fields of the json, has undesired non-json values in some of the records. For instance, an inner field might contains HTTP errors, that would be interpreted as a string, rather than as a […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031