Concurrency In Hadoop Using ZooKeeper

Kevin Feasel

2016-08-16

Hadoop

Garima Dosi discusses an architecture using ZooKeeper to introduce some limited protections for concurrent access in HDFS:

The ZooKeeper nodes topology as per the design looks like this. ZooKeeper works like a filesystem starting with a root directory followed with several nodes (analogous to folders) and finally the data nodes (analogous to files). The circles in the image represent the name of a property/folder that we are trying to maintain and the rounded boxes are the values/files for those properties/folders.

So, the image above shows that the “global version” is 100 and there are 10 & 20 read requests being executed on versions 98 and 99 respectively and since there is a write request in progress, no other write request would be taken up until it completes.

This feels a little overly complicated to me.

Related Posts

Page Ranking With Kafka Streams

Hunter Kelly walks through a page ranking algorithm: Once you have the adjacency matrix, you perform some straightforward matrix calculations to calculate a vector of Hub scores and a vector of Authority scores as follows: Sum across the columns and normalize, this becomes your Hub vector Multiply the Hub vector element-wise across the adjacency matrix […]

Read More

Stateful Processing In Spark Streaming

Bill Chambers and Jules Damji look at a couple of stateful scenarios within Spark Streaming: No streaming events are free of duplicate entries. Dropping duplicate entries in record-at-a-time systems is imperative—and often a cumbersome operation for a couple of reasons. First, you’ll have to process small or large batches of records at time to discard […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031