Analyzing Real-Time Data

Manjeet Chayel connects Spark Streaming to Amazon Kinesis and shows how to analyze the data in real time:

To use this post to play around with streaming data, you need an AWS account and AWS CLI configured on your machine. The entire pattern can be implemented in few simple steps:

  1. Create an Amazon Kinesis stream.

  2. Spin up an EMR cluster with Hadoop, Spark, and Zeppelin applications from advanced options.

  3. Use a Simple Java producer to push random IoT events data into the Amazon Kinesis stream.

  4. Connect to the Zeppelin notebook.

  5. Import the Zeppelin notebook from GitHub.

  6. Analyze and visualize the streaming data.

This is a good way of getting started with streaming data.  I’ve grown quite fond of notebooks in the short time that I’ve used them, as they make it very easy for people who know what they’re doing to provide code and information to people who want to know what they’re doing.

Related Posts

Streaming ETL In Practice Using KSQL

Robin Moffatt builds an example of streaming ETL using Oracle, GoldenGate, and Kafka: So in this post I’m going to show an example of what streaming ETL looks like in practice. I’m replacing batch extracts with event streams, and batch transformation with in-flight transformation of these event streams. We’ll take a stream of data from […]

Read More

Automating HDF Cluster Deployment

Ali Bajwa has a how-to guide for automating HDF 3.1 cluster deployment on AWS: The release of HDF 3.1 brings about a significant number of improvements in HDF: Apache Nifi 1.5, Kafka 1.0, plus the new NiFi registry. In addition, there were improvements to Storm, Streaming Analytics Manager, Schema Registry components. This article shows how you can […]

Read More


June 2016
« May Jul »