Learning Spark Structured Streaming

Jules Damji has a nice compendium of links and additional resources for people wanting to learn more about Apache Spark’s Structured Streaming:

Structured Streaming In Apache Spark: A new high-level API for streaming

Databricks’ engineers and Apache Spark committers Matei Zaharia, Tathagata Das, Michael Armbrust and Reynold Xin expound on why streaming applications are difficult to write, and how Structured Streaming addresses all the underlying complexities.

There’s quite a bit of reading material on the other side.

Related Posts

Stream-To-Stream Joins In Spark

Ayush Tiwari shows how to join a pair of streams in Apache Spark 2.3: In Spark 2.3, it added support for stream-stream joins, i.e, we can join two streaming Datasets/DataFrames and in this blog we are going to see how beautifully spark now give support for joining the two streaming dataframes. I this example, I […]

Read More

Spark: DataFrame To RDD For Data Cleansing

Gilad Moscovitch walks us through a common data cleansing problem with Spark data frames: A problem can arise when one of the inner fields of the json, has undesired non-json values in some of the records. For instance, an inner field might contains HTTP errors, that would be interpreted as a string, rather than as a […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031