Building A Multi-Node Hadoop Cluster With Spark

Rao Swati has a step-by-step instruction guide on how to set up a multi-node cluster with Hadoop 2.7.3 and Spark 1.6.2:

Important Notes:

  1. Start-dfs.sh  will start NameNode, SecondaryNamenode, DataNode on master and DataNode on all slaves node.
  2. Start-yarn.sh  will start NodeManager, ResourceManager on the master node and NodeManager on slaves.
  3. Perform  Hadoop namenode -format  only once otherwise you will get an incompatible cluster_id exception. To resolve this error clear temporary data location for datanode i.e, remove the files present in $HADOOP_HOME/dfs/name/data folder.

If you’d like to set up your own Hadoop cluster rather than using one of the big vendors (Hortonworks, Cloudera, MapR) or a PaaS solution like HDInsight or ElasticMapReduce, this will give you a head start.

Related Posts

Get Windows Failover Cluster Errors

John Morehouse walks us through the Get-ClusterLog cmdlet in Powershell: Sometimes you know that a problem occurred, but the tools are not giving you the right information.  If you ever look at the Cluster Failover Manager for a Windows Cluster, sometimes that can happen.  The user interface won’t show you any errors, but you KNOW […]

Read More

Azure Databricks Security

Tristan Robinson looks at what’s currently available in terms of security on Azure Databricks: You’ll notice that as part of this I’m retrieving the secrets/GUIDS I need for the connection from somewhere else – namely the Databricks-backed secrets store. This avoids exposing those secrets in plain text in your notebook – again this would not […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031