Performance Tuning A Streaming Application

Mathieu Dumoulin explains how he was able to get 10x performance out of a streaming application built around Kafka, Spark Streaming, and Apache Ignite:

The main issues for these applications were caused by trying to run a development system’s code, tested on AWS instances on a physical, on-premise cluster running on real data. The original developer was never given access to the production cluster or the real data.

Apache Ignite was a huge source of problems, principally because it is such a new project that nobody had any real experience with it and also because it is not a very mature project yet.

I found this article fascinating, particularly because the answer was a lot more than just “throw some more hardware at the problem.”

Related Posts

Cassandra To Kafka Connect

Mike Barlotta shows how to feed data into Kafka from Cassandra via Kafka Connect.  Part one involves basic setup: Modeling data in Cassandra must be done around the queries that are needed to access the data (see this article for details). Typically this means that there will be one table for each query and data (in our […]

Read More

Use Cases For Apache Kafka

Amy Boyle shows a few scenarios where New Relic uses Apache Kafka: The Events Pipeline team is responsible for plumbing some of New Relic’s core data streams-specifically, event data. These are fine-grained nuggets of monitoring data that record a single event at a particular moment in time. For example, an event could be an error thrown […]

Read More


January 2017
« Dec Feb »