Spark Memory Management on EMR

Karunanithi Shanmugam gives us some tips on memory management for Spark in Amazon’s ElasticMapReduce:

Amazon EMR provides high-level information on how it sets the default values for Spark parameters in the release guide. These values are automatically set in the spark-defaults settings based on the core and task instance types in the cluster.

To use all the resources available in a cluster, set the maximizeResourceAllocation parameter to true. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets these parameters in the spark-defaults settings. Even with this setting, generally the default numbers are low and the application doesn’t use the full strength of the cluster. For example, the default for spark.default.parallelism is only 2 x the number of virtual cores available, though parallelism can be higher for a large cluster.

Spark on YARN can dynamically scale the number of executors used for a Spark application based on the workloads. Using Amazon EMR release version 4.4.0 and later, dynamic allocation is enabled by default (as described in the Spark documentation).

There’s a lot in here, much of which applies to Spark in general and not just EMR.

Related Posts

Pivoting Spark DataFrames

Unmesha Sreeveni shows how we can pivot a DataFrame in Apache Spark using one line of code: A pivot can be thought of as translating rows into columns while applying one or more aggregations. Lets see how we can achieve the same using the above dataframe. We will pivot the data based on “Item” column. […]

Read More

Troubleshooting Spark Performance

Bikas Saha and Mridul Murlidharan explain some of the basics of performance tuning with Apache Spark: Our objective was to build a system that would provide an intuitive insight into Spark jobs that not just provides visibility but also codifies the best practices and deep experience we have gained after years of debugging and optimizing […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

April 2019
MTWTFSS
« Mar  
1234567
891011121314
15161718192021
22232425262728
2930