Exactly-Once Writes From Kafka To S3

Konstantine Karantasis takes us through writing from a Kafka topic into S3:

When customers were asking for an S3 connector, there were already several Kafka-to-S3 solutions out there at the time, so we had to decide whether to adopt an existing S3 connector, modify the Kafka Connect HDFS connector (as some developers attempted to do) or write a new connector from scratch.

We knew that our users needed three things from the connector:
1. Integration with the Kafka Connect API: Connect’s scaling and fault tolerance capabilities were important to have, and users didn’t want yet another system that they’d need to learn how to use, deploy and monitor.
2. Exactly once: Users didn’t want to waste expensive compute cycles on deduplicating their data. And no one likes missing events.
3. No extra dependencies: Especially dependencies on additional datastores. Kafka clients and the S3 SDK libraries should be all you need to get events from Kafka to S3. Simplicity rules, especially in a distributed systems world where simple is often the key to being reliable.

When we considered the existing connectors, we noticed that none of them delivered the reliability and exactly once capabilities we wanted. They treat S3 like it’s another file system—though it isn’t really. For example, S3 lacks file appends, it is eventually consistent, and listing a bucket is often a very slow operation.

Click through for a dive into what Confluent did and how it works.

Related Posts

Query Editing with Azure SQL Database

Dave Bland shows off the Azure SQL Database query editor built into the Azure portal: We have all been using SQL Server Management Studio to query and manipulate data, even for an Azure SQL database.  There is also an option to do this same thing built into the SQL Azure database interface in the Azure […]

Read More

Backing Up Database to Azure Blob Storage

Jamie Wick shows us how we can back up database directly to Azure Blob Storage: Azure storage, as a backup destination for SQL backups, is a great option for organizations that are contemplating replacing older on-prem NAS appliances or improve their Disaster Recovery functionality. The tiered storage pricing, along with local and global redundancy options, […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

April 2019
MTWTFSS
« Mar  
1234567
891011121314
15161718192021
22232425262728
2930