Getting Started With Hadoop

Kevin Feasel

2016-08-09

Hadoop

Jon Morisi is looking at Hadoop:

From here I think I’ll start playing around on a sandbox.  Each of the distributions offers a way to spin up a VM or log into a cloud based environment.  There are also docker images out there. (search hadoop, cloudera, or hortonworks).  Most of these docker images look fairly new, so don’t cut yourself.

I’m looking at the HortonWorks distro, so I’ll probably setup a Hortonworks Sandbox.

Jon includes some resources he’s used to learn a bit about the topic.  I think he’s going down the right path with videos and mailing lists—the ecosystem changes too quickly for books to have much long-term value, and mailing lists & forums tend to be better for keeping up to date.  My biggest suggestion is to get case studies and play around.  Check out studies on e-mail ingest, real-time data analysis, and analyzing fantasy sports for starters.

Related Posts

Anomaly Detection With Kafka Streams

Ajmal Karuthakantakath shows us an application which performs fairly simple anomaly detection using Kafka Streams: The problem is in the banking loan payment domain, where customers have taken a loan and they need to make monthly payments to repay the loan amount. Assume there are millions of customers in the system and all these customers need […]

Read More

Crossing The Streams With Kafka

Himani Arora shows how to join two Kafka streams together: KStream-KStream Join It is a sliding window join, that means, all tuples close to each other with regard to time are joined. Time here is the difference up to size of the window. These joins are always windowed joins because otherwise, the size of the internal state […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031