Press "Enter" to skip to content

Identifying Straggler Tasks in Spark Applications

Ajay Gupta clues us in on a process:

What Is a Straggler in a Spark Application?

A straggler refers to a very very slow executing Task belonging to a particular stage of a Spark application (Every stage in Spark is composed of one or more Tasks, each one computing a single partition out of the total partitions designated for the stage). A straggler Task takes an exceptionally high time for completion as compared to the median or average time taken by other tasks belonging to the same stage. There could be multiple stragglers in a Spark Job being present either in the same stage or across multiple stages. 

Read on to understand the consequences and causes of these straggler tasks, as well as what you can do about them.