Lokesh Jain has some advice when it comes to disk and data node size:
There are two factors to keep in mind when choosing node capacity. These will be discussed in detail in the next sections.
1. Large Disks – total node capacity being the same, using more disks is better as it yields higher aggregate IO bandwidth.
2. Dense Nodes – as nodes get denser, recovery after node failure takes longer.These factors are not HDFS-specific and will impact any distributed storage service that replicates data for redundancy and serves live workloads.
Click through for specific advice on maximum disk and node sizes.
Comments closed