Pipeline Architecture With Kafka

Alexandra Wang describes how Pandora Media has used Apache Kafka for real-time ad serving using Kafka Connect:

Our ad server publishes billions of messages per day to Kafka. We soon realized that writing a proprietary Kafka consumer able to handle that amount of data with the desired offset management logic would be non-trivial, especially when requiring exactly-once-delivery semantics. We found that the Kafka Connect API paired with the HDFS connector developed by Confluent would be perfect for our use case.

We’ve also found it painful not having a central authority on data structures that can share their respective schemas across all services and applications. Without a central registry for message schemas, data serialization and deserialization for a variety of applications are troublesome and the pipeline is fragile when schema evolution happens. We found Schema Registry is a great solution for this problem.

To address the above two problems, we integrated the Kafka Connect API and Schema Registry into our Kafka-centered data pipeline.

Well worth reading, especially the difficulties that they’ve had during maintenance periods and in lower environments.

Related Posts

Using h2o.ai On HDInsight

Xiaoyong Zhu shows how to set up h2o.ai on Azure HDInsight: H2O Flow is an interactive web-based computational user interface where you can combine code execution, text, mathematics, plots and rich media into a single document, much like Jupyter Notebooks. With H2O Flow, you can capture, rerun, annotate, present, and share your workflow. H2O Flow […]

Read More

HDInsight 3.6 Available

Ashish Thapliyal points out some Hive improvements in HDInsight 3.6: 2 Create a new Hive table from scratch or alter Table Create a new table by, clicking on the ‘+’ icon, which opens the create table wizard. Enter table name, column name and choose a data type from the dropdown. You can pick folloiwng advanced […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

April 2017
MTWTFSS
« Mar  
 12
3456789
10111213141516
17181920212223
24252627282930