Building A Multi-Node Hadoop Cluster With Spark

Rao Swati has a step-by-step instruction guide on how to set up a multi-node cluster with Hadoop 2.7.3 and Spark 1.6.2:

Important Notes:

  1. Start-dfs.sh  will start NameNode, SecondaryNamenode, DataNode on master and DataNode on all slaves node.
  2. Start-yarn.sh  will start NodeManager, ResourceManager on the master node and NodeManager on slaves.
  3. Perform  Hadoop namenode -format  only once otherwise you will get an incompatible cluster_id exception. To resolve this error clear temporary data location for datanode i.e, remove the files present in $HADOOP_HOME/dfs/name/data folder.

If you’d like to set up your own Hadoop cluster rather than using one of the big vendors (Hortonworks, Cloudera, MapR) or a PaaS solution like HDInsight or ElasticMapReduce, this will give you a head start.

Related Posts

Apache Avro 1.9.0 Released

Fokko Driesprong announces the release of Apache Avro 1.9.0: Avro is a remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. If you’re unfamiliar with Avro, I would highly recommend the explanation of Dennis Vriend […]

Read More

Proposed Max Server Memory Defaults

Randolph West has a proposal for default max server memory on a SQL Server instance: As noted in the previous post in this series, memory in SQL Server is generally divided between query plans in the plan cache, and data in the buffer pool (other uses for memory in SQL Server are listed later in this post). The official documentation tells us:[T]he […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031