Press "Enter" to skip to content

Broadcast Variables in Apache Spark

The Hadoop in Real World team explains the notion of broadcast variables in Apache Spark:

Broadcast variables are variables which are available in all executors executing the Spark application. These variables are already cached and ready to be used by tasks executing as part of the application. Broadcast variables are sent to the executors only once and it is available for all tasks executing in the executors.

Read on to understand when they are useful and, just as importantly, when not to use them. They seem like the type of thing which a newer developer could easily misuse.