Collecting Hadoop Metrics from Multiple Clusters

Dmitry Tolpeko shows how you can collate Hadoop metrics from several ElasticMapReduce clusters:

The first step is to dynamically get the list of clusters and their IPs. Hadoop clusters are often reprovisioned, added and terminated, so you cannot use the static list and addresses. In case of Amazon EMR, you can use the following Linux shell command to get the list of active clusters:

aws emr list-clusters --active

From its output you can get the cluster IDs and names. As a cluster ID and IP can change over time, its name is usually permanent (like DEV or Adhoc-Analytics cluster) so it can be useful for various aggregation reports.

Read on to see what you can do with this list of clusters.

Related Posts

Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4: We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.– Simplifies integrated […]

Read More

Running Confluent Platform with .NET

Niels Berglund shows how you can install Confluent Platform as a Docker container and use the .NET client against it: What we see in Figure 16 are the various project related files, including the source file Program.cs. What is missing now is a Kafka client. For .NET there exists a couple of clients, and theoretically, you can use […]

Read More

Categories

May 2019
MTWTFSS
« Apr Jun »
 12345
6789101112
13141516171819
20212223242526
2728293031