Justin Kestelyn discusses the Fair Scheduler in YARN:
Assume that we have a YARN cluster with total resources <memory: 800GB, vcores 200> with two queues:
root.busy
(weight=1.0) androot.sometimes_busy
(weight 3.0). There are generally four scenarios of interest:
-
Scenario A: The busy queue is full with applications, and
sometimes_busy
queue has a handful of running applications (say 10%, i.e. <memory: 80GB, vcores: 20>). Soon, a large number of applications are added to thesometimes_busy
queue in a relatively short time window. All the new applications insometimes_busy
will be pending, and will become active as containers finish up in thebusy
queue. If the tasks in thebusy
queue are fairly short-lived, then the applications in thesometimes_busy
queue will not wait long to get containers assigned. However, if the tasks in thebusy
queue take a long time to finish, the new applications in thesometimes_busy
queue will stay pending for a long time. In either case, as the applications in thesometimes_busy
queue become active, many of the running applications in thebusy
queue will take much longer to finish.
If you’re interested in a deeper dive into YARN, this is a good series to start with.