Spark Metrics

Swaroop Ramachandra looks at some key metrics for Spark administration:

Once you have identified and broken down the Spark and associated infrastructure and application components you want to monitor, you need to understand the metrics that you should really care about that affects the performance of your application as well as your infrastructure. Let’s dig deeper into some of the things you should care about monitoring.

  1. In Spark, it is well known that Memory related issues are typical if you haven’t paid attention to the memory usage when building your application. Make sure you track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Garbage collection stalls or abnormality in patterns can increase back pressure.

There are a few metrics of note here.  Check it out.

Related Posts

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

Forcing MAXDOP In Azure SQL DB

Arun Sirpal shows us that you can change MAXDOP in Azure SQL Database: In this quick post I will show you my parallel plan and how I use MAXDOP = 1 to suppress parallel plan generation so the operation will be executed serially. (Disclaimer – I am not saying this is the right thing to […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930