Antonios Chalkipoulos explains how Landoop’s Lenses product helps with data governance:
One of the fundamental requirements of GDPR is theĀ Right to Retrieve Personal Data.
With Lenses SQL the above requirement can be covered via a set of simple but thorough queries into the topics that contain PII data:
SELECT * from topicA WHERE customer.id = "XXX"Lenses will retrieve and deserialize the data from a binary format (i.e. Avro) into a human-readable format and provide fullĀ Control Execution.
Control Execution brings into context the fact that streaming SQL is operating on un-bounded streams of events: A query would normally be a never-ending query. In order to bring query termination schemantics into Apache Kafka we introduced 4 controls:
-
LIMIT 10000 – Force the query to terminate when 10,000 records are matched
-
max.bytes = 20000000 – Force the query to terminate once 20 MBytes have been retrieved
-
max.time = 60000 – Force the query to terminate after 60 seconds
-
max.zero.polls = 8 – Force the query to terminate after 8 consecutive polls are empty, indicating we have exhausted a topic
GDPR implementation is a lot trickier for a system like Kafka, but it’s still possible.