Using Spark Streaming On Kafka

Ayush Tiwari has an introductory tutorial on using Spark Streaming on top of Kafka:

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change.

In this blog, I am going to implement the basic example on Spark Structured Streaming & Kafka Integration.

This is a code-heavy tutorial, so check it out.

Related Posts

Page Ranking With Kafka Streams

Hunter Kelly walks through a page ranking algorithm: Once you have the adjacency matrix, you perform some straightforward matrix calculations to calculate a vector of Hub scores and a vector of Authority scores as follows: Sum across the columns and normalize, this becomes your Hub vector Multiply the Hub vector element-wise across the adjacency matrix […]

Read More

Stateful Processing In Spark Streaming

Bill Chambers and Jules Damji look at a couple of stateful scenarios within Spark Streaming: No streaming events are free of duplicate entries. Dropping duplicate entries in record-at-a-time systems is imperative—and often a cumbersome operation for a couple of reasons. First, you’ll have to process small or large batches of records at time to discard […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930