The Hadoop in Real World team explains why you might see 200 tasks when running a Spark job:
It is quite common to see 200 tasks in one of your stages and more specifically at a stage which requires wide transformation. The reason for this is, wide transformations in Spark requires a shuffle. Operations like join, group by etc. are wide transform operations and they trigger a shuffle.
Read on to learn why 200, and whether 200 is the right number for you.