Pipeline Architecture With Kafka

Alexandra Wang describes how Pandora Media has used Apache Kafka for real-time ad serving using Kafka Connect:

Our ad server publishes billions of messages per day to Kafka. We soon realized that writing a proprietary Kafka consumer able to handle that amount of data with the desired offset management logic would be non-trivial, especially when requiring exactly-once-delivery semantics. We found that the Kafka Connect API paired with the HDFS connector developed by Confluent would be perfect for our use case.

We’ve also found it painful not having a central authority on data structures that can share their respective schemas across all services and applications. Without a central registry for message schemas, data serialization and deserialization for a variety of applications are troublesome and the pipeline is fragile when schema evolution happens. We found Schema Registry is a great solution for this problem.

To address the above two problems, we integrated the Kafka Connect API and Schema Registry into our Kafka-centered data pipeline.

Well worth reading, especially the difficulties that they’ve had during maintenance periods and in lower environments.

Related Posts

Metacat: Federated Metadata Discovery

Ajoy Majumdar and Zhen Li walk us through Metacat: The core architecture of the big data platform at Netflix involves three key services. These are the execution service (Genie), the metadata service, and the event service. These ideas are not unique to Netflix, but rather a reflection of the architecture that we felt would be […]

Read More

Understanding A Spark Streaming Workflow

Himanshu Gupta continues a series on structured streaming using Spark Streaming: Here we can clearly see that if new data is pushed to the source, Spark will run the “incremental” query that combines the previous running counts with the new data to compute updated counts. The “Input Table” here is the lines DataFrame which acts as a […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930