Spark Metrics

Swaroop Ramachandra looks at some key metrics for Spark administration:

Once you have identified and broken down the Spark and associated infrastructure and application components you want to monitor, you need to understand the metrics that you should really care about that affects the performance of your application as well as your infrastructure. Let’s dig deeper into some of the things you should care about monitoring.

  1. In Spark, it is well known that Memory related issues are typical if you haven’t paid attention to the memory usage when building your application. Make sure you track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Garbage collection stalls or abnormality in patterns can increase back pressure.

There are a few metrics of note here.  Check it out.

Related Posts

Data Discovery And Classification In SQL Server

Gilad Mittelman explains how the SQL Information Protection (aka Data Discovery and Classification) process works in SQL Server and Azure SQL Database: SQL Information Protection (SQL IP) introduces a set of advanced services and new SQL capabilities, forming a new information protection paradigm in SQL aimed at protecting the data, not just the database: Discovery […]

Read More

Streaming ETL In Practice Using KSQL

Robin Moffatt builds an example of streaming ETL using Oracle, GoldenGate, and Kafka: So in this post I’m going to show an example of what streaming ETL looks like in practice. I’m replacing batch extracts with event streams, and batch transformation with in-flight transformation of these event streams. We’ll take a stream of data from […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930