Pipeline Architecture With Kafka

Alexandra Wang describes how Pandora Media has used Apache Kafka for real-time ad serving using Kafka Connect:

Our ad server publishes billions of messages per day to Kafka. We soon realized that writing a proprietary Kafka consumer able to handle that amount of data with the desired offset management logic would be non-trivial, especially when requiring exactly-once-delivery semantics. We found that the Kafka Connect API paired with the HDFS connector developed by Confluent would be perfect for our use case.

We’ve also found it painful not having a central authority on data structures that can share their respective schemas across all services and applications. Without a central registry for message schemas, data serialization and deserialization for a variety of applications are troublesome and the pipeline is fragile when schema evolution happens. We found Schema Registry is a great solution for this problem.

To address the above two problems, we integrated the Kafka Connect API and Schema Registry into our Kafka-centered data pipeline.

Well worth reading, especially the difficulties that they’ve had during maintenance periods and in lower environments.

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Review: dbForge Studio For Database Modeling

Randolph West is looking for a product for database modeling and tries out dbForge Studio: These days I still design new databases from scratch with pen and paper (or iPad and Apple Pencil), where the entity relationship diagram (ERD) is rudimentary and crows’ feet relationships are badly-scrawled. But it got me wondering which database modelling […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930