Learning Spark Structured Streaming

Jules Damji has a nice compendium of links and additional resources for people wanting to learn more about Apache Spark’s Structured Streaming:

Structured Streaming In Apache Spark: A new high-level API for streaming

Databricks’ engineers and Apache Spark committers Matei Zaharia, Tathagata Das, Michael Armbrust and Reynold Xin expound on why streaming applications are difficult to write, and how Structured Streaming addresses all the underlying complexities.

There’s quite a bit of reading material on the other side.

Related Posts

Using Databricks Delta In Lieu Of Lambda Architecture

Jose Mendes contrasts the Lambda architecture with the Databricks Delta architecture and gives us a quick example of using Databricks Delta: The major problem of the Lambda architecture is that we have to build two separate pipelines, which can be very complex, and, ultimately, difficult to combine the processing of batch and real-time data, however, […]

Read More

An Overview Of Apache Kafka

Leona Zhang has a series going on Apache Kafka.  Part one covers some of the concepts around messaging systems: There is a difference between batch processing applications and stream processing applications. A visible boundary determines the most significant difference between batch processing and stream processing. If it exists, it is called batch processing. For example, […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031