Data Governance On Apache Kafka With Lenses

Kevin Feasel

2018-05-22

Hadoop

Antonios Chalkipoulos explains how Landoop’s Lenses product helps with data governance:

One of the fundamental requirements of GDPR is the Right to Retrieve Personal Data.

With Lenses SQL the above requirement can be covered via a set of simple but thorough queries into the topics that contain PII data:

SELECT * from topicA WHERE customer.id = "XXX"

Lenses will retrieve and deserialize the data from a binary format (i.e. Avro) into a human-readable format and provide full Control Execution.

Control Execution brings into context the fact that streaming SQL is operating on un-bounded streams of events: A query would normally be a never-ending query. In order to bring query termination schemantics into Apache Kafka we introduced 4 controls:

  • LIMIT 10000 – Force the query to terminate when 10,000 records are matched

  • max.bytes = 20000000 – Force the query to terminate once 20 MBytes have been retrieved

  • max.time = 60000 – Force the query to terminate after 60 seconds

  • max.zero.polls = 8 – Force the query to terminate after 8 consecutive polls are empty, indicating we have exhausted a topic

GDPR implementation is a lot trickier for a system like Kafka, but it’s still possible.

Related Posts

Metacat: Federated Metadata Discovery

Ajoy Majumdar and Zhen Li walk us through Metacat: The core architecture of the big data platform at Netflix involves three key services. These are the execution service (Genie), the metadata service, and the event service. These ideas are not unique to Netflix, but rather a reflection of the architecture that we felt would be […]

Read More

Understanding A Spark Streaming Workflow

Himanshu Gupta continues a series on structured streaming using Spark Streaming: Here we can clearly see that if new data is pushed to the source, Spark will run the “incremental” query that combines the previous running counts with the new data to compute updated counts. The “Input Table” here is the lines DataFrame which acts as a […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031