HDInsight clusters consist of several virtual machines (nodes) serving different purposes. The most common architecture of an HDInsight cluster is – two head nodes, one or more worker nodes, and three zookeeper nodes.
Head nodes: Hadoop services are installed and run on head nodes. There are two head nodes to ensure high availability by allowing master services and components to continue to run on the secondary node in the event of a failure on the primary. Both head nodes are active and running within the cluster simultaneously. Some services, such as HDFS or YARN, are only ‘active’ on one head node at any given time (and ‘standby’ on the other head node). Other services such as HiveServer2 or Hive Metastore are active on both head nodes at the same time. There are services like Application Timeline Server (ATS) and Job History Server (JHS) which are installed on both head nodes but should run only on the head node where Ambari server is running. If these components sound unfamiliar, please revisit the article on Hadoop ecosystem in HDInsight.
Read on to see the other classes of nodes HDInsight uses.