Securing The Data Plane

Michael Schiebel gives an overview of security architecture inside a data lake:

Existing platform based Hadoop architectures make several implicit assumptions on how users interact with the platform such as developmental research versus production applications.  While this was perfectly good in a research mode, as we move to a modern data application architecture we need to bring back modern application concepts to the Hadoop ecosystem.  For example, existing Hadoop architectures tightly couple the user interface with the source of data.  This is done for good reasons that apply in a data discovery research context, but cause significant issues in developing and maintaining a production application.  We see this in some of the popular user interfaces such as Kibana, Banana, Grafana, etc.  Each user interface is directly tied to a specific type of data lake and imposes schema choices on that data.

Read the whole thing.  Also, “Securing the data plane” sounds like a terrible ’90s action film.

Related Posts

Page Ranking With Kafka Streams

Hunter Kelly walks through a page ranking algorithm: Once you have the adjacency matrix, you perform some straightforward matrix calculations to calculate a vector of Hub scores and a vector of Authority scores as follows: Sum across the columns and normalize, this becomes your Hub vector Multiply the Hub vector element-wise across the adjacency matrix […]

Read More

Stateful Processing In Spark Streaming

Bill Chambers and Jules Damji look at a couple of stateful scenarios within Spark Streaming: No streaming events are free of duplicate entries. Dropping duplicate entries in record-at-a-time systems is imperative—and often a cumbersome operation for a couple of reasons. First, you’ll have to process small or large batches of records at time to discard […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031