Hadoop Name Node Capacity Planning

Kevin Feasel

2017-07-17

Hadoop

Mamta Chawla has some rules of thumb for sizing your Hadoop name node:

Both name node servers should have highly reliable storage for their namespace storage and edit-log journaling. That’s why — contrary to the recommended JBOD for data nodes — RAID is recommended for name nodes.

Master servers should have at least four redundant storage volumes — some local and some networked — but each can be relatively small (typically 1TB).

It is easy to determine the memory needed for both name node and secondary name node. The memory needed by name node to manage the HDFS cluster metadata in memory and the memory needed for the OS must be added together. Typically, the memory needed by the secondary name node should be identical to the name node.

Click through for some specific recommendations.

Related Posts

Last-Click Attribution With Databricks Delta

Caryl Yuhas and Denny Lee give us an example of building a last-click digital marketing attribution model with Databricks Delta: The first thing we will need to do is to establish the impression and conversion data streams.   The impression data stream provides us a real-time view of the attributes associated with those customers who were served the […]

Read More

Working With Kafka At Scale

Tony Mancill has some tips for working with large-scale Kafka clusters: Unless you have architectural needs that require you to do otherwise, use random partitioning when writing to topics. When you’re operating at scale, uneven data rates among partitions can be difficult to manage. There are three main reasons for this: First, consumers of the “hot” […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31