Using Spark Streaming On Kafka

Ayush Tiwari has an introductory tutorial on using Spark Streaming on top of Kafka:

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change.

In this blog, I am going to implement the basic example on Spark Structured Streaming & Kafka Integration.

This is a code-heavy tutorial, so check it out.

Related Posts

Deploying Cloudera Enterprise On Azure

Xavier Morera announces a new Cloudera course: You will start by learning the Microsoft Azure services required to deploy a secure, elastic, Cloudera Enterprise cluster. These core services include security, networking, virtual machine management, and storage, just to name a few. Then, you’ll learn best practices and patterns for cloud-based clusters, including tips and caveats for security […]

Read More

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS: As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930