Data Governance On Apache Kafka With Lenses

Kevin Feasel

2018-05-22

Hadoop

Antonios Chalkipoulos explains how Landoop’s Lenses product helps with data governance:

One of the fundamental requirements of GDPR is the Right to Retrieve Personal Data.

With Lenses SQL the above requirement can be covered via a set of simple but thorough queries into the topics that contain PII data:

SELECT * from topicA WHERE customer.id = "XXX"

Lenses will retrieve and deserialize the data from a binary format (i.e. Avro) into a human-readable format and provide full Control Execution.

Control Execution brings into context the fact that streaming SQL is operating on un-bounded streams of events: A query would normally be a never-ending query. In order to bring query termination schemantics into Apache Kafka we introduced 4 controls:

  • LIMIT 10000 – Force the query to terminate when 10,000 records are matched

  • max.bytes = 20000000 – Force the query to terminate once 20 MBytes have been retrieved

  • max.time = 60000 – Force the query to terminate after 60 seconds

  • max.zero.polls = 8 – Force the query to terminate after 8 consecutive polls are empty, indicating we have exhausted a topic

GDPR implementation is a lot trickier for a system like Kafka, but it’s still possible.

Related Posts

Handling Errors in Kafka Connect

Robin Moffatt shows us some techniques for handling errors in your Kafka topics: We’ve seen how setting errors.tolerance = all will enable Kafka Connect to just ignore bad messages. When it does, by default it won’t log the fact that messages are being dropped. If you do set errors.tolerance = all, make sure you’ve carefully thought through […]

Read More

Batch Consumption from Kafka with Spark

Swapnil Chougule shares a few tips on performing batch processing of a Kafka topic using Apache Spark: Spark as a compute engine is very widely accepted by most industries. Most of the old data platforms based on MapReduce jobs have been migrated to Spark-based jobs, and some are in the phase of migration. In short, […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031