Press "Enter" to skip to content

Finding the Right Disk and Data Node Sizes in HDFS

Lokesh Jain has some advice when it comes to disk and data node size:

There are two factors to keep in mind when choosing node capacity. These will be discussed in detail in the next sections.

1. Large Disks – total node capacity being the same, using more disks is better as it yields higher aggregate IO bandwidth.
2. Dense Nodes – as nodes get denser, recovery after node failure takes longer.

These factors are not HDFS-specific and will impact any distributed storage service that replicates data for redundancy and serves live workloads.

Click through for specific advice on maximum disk and node sizes.