Press "Enter" to skip to content

Data Governance On Apache Kafka With Lenses

Antonios Chalkipoulos explains how Landoop’s Lenses product helps with data governance:

One of the fundamental requirements of GDPR is theĀ Right to Retrieve Personal Data.

With Lenses SQL the above requirement can be covered via a set of simple but thorough queries into the topics that contain PII data:

SELECT * from topicA WHERE customer.id = "XXX"

Lenses will retrieve and deserialize the data from a binary format (i.e. Avro) into a human-readable format and provide fullĀ Control Execution.

Control Execution brings into context the fact that streaming SQL is operating on un-bounded streams of events: A query would normally be a never-ending query. In order to bring query termination schemantics into Apache Kafka we introduced 4 controls:

  • LIMIT 10000 – Force the query to terminate when 10,000 records are matched

  • max.bytes = 20000000 – Force the query to terminate once 20 MBytes have been retrieved

  • max.time = 60000 – Force the query to terminate after 60 seconds

  • max.zero.polls = 8 – Force the query to terminate after 8 consecutive polls are empty, indicating we have exhausted a topic

GDPR implementation is a lot trickier for a system like Kafka, but it’s still possible.