How Qubole Optimizes Apache Spark Clusters

Mikhail Stolpner gives us some tips on how to optimize Apache Spark clusters:

There are four major resources: memory, compute (CPU), disk, and network. Memory and compute are by far the most expensive. Understanding how much compute and memory your application requires is crucial for optimization.

You can configure how much memory and how many CPUs each executor gets. While the number of CPUs for each task is fixed, executor memory is shared between the tasks processed by a single executor.

A few key parameters provide the most impact on how Spark is executed in terms of resources: spark.executor.memoryspark.executor.coresspark.task.cpus, spark.executor.instances, and spark.qubole.max.executors.

This article gives us some idea of the levers we have available as well as when to pull them.  Though the article itself is vendor-specific, a lot of the advice is general.

Related Posts

Hortonworks Data Platform 3.0 Released

Saumitra Buragohain, et al, announce the newest version of the Hortonworks Data Platform: Highlighted Apache Hive features include: Workload management for LLAP:  You can assign resource pools within LLAP pool and allocate resources on a per user or per group basis. This enables support for large multi-tenant deployments. ACID v2 and ACID on by default:  We are […]

Read More

Replicating Data In HDFS Between Clusters

Murali Ramasami and Niru Anisetti have an article showing how to use the Hortonworks Data Lifecycle Manager to set up replication between two Hadoop clusters: Data Lifecycle Manager (DLM) delivers on the promise of location-agnostic, secure replication by encapsulating and copying data seamlessly across physical private storage and public cloud environments. This empowers businesses to […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

July 2018
MTWTFSS
« Jun  
 1
2345678
9101112131415
16171819202122
23242526272829
3031