Important Notes:
Start-dfs.sh
will start NameNode, SecondaryNamenode, DataNode on master and DataNode on all slaves node.Start-yarn.sh
will start NodeManager, ResourceManager on the master node and NodeManager on slaves.- Perform
Hadoop namenode -format
only once otherwise you will get an incompatible cluster_id exception. To resolve this error clear temporary data location for datanode i.e, remove the files present in $HADOOP_HOME/dfs/name/data folder.
If you’d like to set up your own Hadoop cluster rather than using one of the big vendors (Hortonworks, Cloudera, MapR) or a PaaS solution like HDInsight or ElasticMapReduce, this will give you a head start.