The YARN Fair Scheduler

Kevin Feasel

2016-06-17

Hadoop

Justin Kestelyn discusses the Fair Scheduler in YARN:

Assume that we have a YARN cluster with total resources <memory: 800GB, vcores 200> with two queues: root.busy (weight=1.0) and root.sometimes_busy (weight 3.0).  There are generally four scenarios of interest:

 

  • Scenario A: The busy queue is full with applications, and sometimes_busy queue has a handful of running applications (say 10%, i.e. <memory: 80GB, vcores: 20>). Soon, a large number of applications are added to the sometimes_busy queue in a relatively short time window. All the new applications in sometimes_busy will be pending, and will become active as containers finish up in thebusy queue. If the tasks in the busy queue are fairly short-lived, then the applications in thesometimes_busy queue will not wait long to get containers assigned. However, if the tasks in the busyqueue take a long time to finish, the new applications in the sometimes_busy queue will stay pending for a long time. In either case, as the applications in the sometimes_busy queue become active, many of the running applications in the busy queue will take much longer to finish.

 

If you’re interested in a deeper dive into YARN, this is a good series to start with.

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930