The following was run on a fresh Ubuntu 16.04.2 LTS installation. The machine I’m using has an Intel Core i5 4670K clocked at 3.4 GHz, 8 GB of RAM and 1 TB of mechanical storage capacity.
First I’ve setup a standalone Hadoop environment following the instructions from my Hadoop 3 installation guide. Below I’ve installed Kafkacat for feeding and reading off of Kafka, libsnappy as I’ll be using Snappy compression on the Kafka topics, Python, Screen for running applications in the background and Zookeeper which is used by Kafka for coordination.
From there, Mark has the configuration scripts and processes to get the entire pipeline built.