Getting Execution Plans In Spark

Anubhav Tarar shows how to get an execution plan for a Spark job:

There are three types of logical plans:

  1. Parsed logical plan.
  2. Analyzed logical plan.
  3. Optimized logical plan.

Analyzed logical plans go through a series of rules to resolve. Then, the optimized logical plan is produced. The optimized logical plan normally allows Spark to plug in a set of optimization rules. You can plug in your own rules for the optimized logical plan.

Click through for the details.

Related Posts

COUNT(*) Versus COUNT(1)

Lukas Eder takes on the myth that COUNT(*) differs from COUNT(1): Now that we know the theory behind these COUNT expressions, what’s the difference between COUNT(*) and COUNT(1). There is none, effectively. The 1 expression in COUNT(1) evaluates a constant expression for each row in the group, and it can be proven that this constant expression will never evaluate to NULL, so effectively, we’re running COUNT(*), […]

Read More

Spark Streaming DStreams

Manish Mishra explains the fundamental abstraction of Spark Streaming: Before going into details of the operations available on the DStream API, let us look at the input sources from which we can start a Stream. There are multiple ways in which we can get the inputs from e.g. Kafka, Flume, etc. Or simple Idle files. […]

Read More

Categories