Collecting Hadoop Metrics from Multiple Clusters

Dmitry Tolpeko shows how you can collate Hadoop metrics from several ElasticMapReduce clusters:

The first step is to dynamically get the list of clusters and their IPs. Hadoop clusters are often reprovisioned, added and terminated, so you cannot use the static list and addresses. In case of Amazon EMR, you can use the following Linux shell command to get the list of active clusters:

aws emr list-clusters --active

From its output you can get the cluster IDs and names. As a cluster ID and IP can change over time, its name is usually permanent (like DEV or Adhoc-Analytics cluster) so it can be useful for various aggregation reports.

Read on to see what you can do with this list of clusters.

Related Posts

Kafka 2.3 and Kafka Connect Improvements

Robin Moffatt goes over improvements in Kafka Connect with the release of Apache Kafka 2.3: A Kafka Connect cluster is made up of one or more worker processes, and the cluster distributes the work of connectors as tasks. When a connector or worker is added or removed, Kafka Connect will attempt to rebalance these tasks. Before version 2.3 of Kafka, […]

Read More

The Databricks File System

Brad Llewellyn takes us through the Azure Databricks File System: Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks.  If you haven’t read the previous posts in this series, Introduction, Cluster Creation and Notebooks, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on […]

Read More

Categories

May 2019
MTWTFSS
« Apr Jun »
 12345
6789101112
13141516171819
20212223242526
2728293031