NodeGroup Performance Issues

Babak Behzad explains potential Hadoop NodeGroup performance bottlenecks:

As can be seen in the logs, the localityWaitFactor value is 1, but the delay that this code causes grows linearly with the number of required containers. Since our DFSIO-large benchmark creates 1,024 files, each 1 GB in size, it requests 1,024 YARN containers. Therefore, the code has to miss at least 1,024 scheduling opportunities until it schedules containers on this (wrongly assumed) OFF_SWITCH node.

But why is this delay enforced? This idea falls into a big area of scheduling research. The Delay Scheduling algorithm was introduced by Matei Zaharia’s EuroSys ’10 paper titled “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”.

That post is a bit deeper than my Hadoop administration comfort level, but if you’re given the task of performance tuning a cluster, this might be one place to look.

Related Posts

Row-By-Row Is Slow-By-Slow

Lukas Eder points out that row-by-row updates are a great way of slowing down your system: The best way to find out is to benchmark. I’m doing two benchmarks for this: One that is run in PL/SQL, showing the performance difference between different approaches that are available to PL/SQL (namely looping, the FORALL syntax, and a single […]

Read More

Five Books For Learning Kafka

Data Flair has a guide to five books to help you learn Apache Kafka: The book “Kafka: The Definitive Guide” is written by engineers from Confluent andLinkedIn who are responsible for developing Kafka. They explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. It contains detailed examples as well. […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031