Justin Kestelyn discusses the Fair Scheduler in YARN:
Assume that we have a YARN cluster with total resources <memory: 800GB, vcores 200> with two queues:
root.busy(weight=1.0) androot.sometimes_busy(weight 3.0). There are generally four scenarios of interest:
-
Scenario A: The busy queue is full with applications, and
sometimes_busyqueue has a handful of running applications (say 10%, i.e. <memory: 80GB, vcores: 20>). Soon, a large number of applications are added to thesometimes_busyqueue in a relatively short time window. All the new applications insometimes_busywill be pending, and will become active as containers finish up in thebusyqueue. If the tasks in thebusyqueue are fairly short-lived, then the applications in thesometimes_busyqueue will not wait long to get containers assigned. However, if the tasks in thebusyqueue take a long time to finish, the new applications in thesometimes_busyqueue will stay pending for a long time. In either case, as the applications in thesometimes_busyqueue become active, many of the running applications in thebusyqueue will take much longer to finish.
If you’re interested in a deeper dive into YARN, this is a good series to start with.