The first step is to dynamically get the list of clusters and their IPs. Hadoop clusters are often reprovisioned, added and terminated, so you cannot use the static list and addresses. In case of Amazon EMR, you can use the following Linux shell command to get the list of active clusters:
aws emr list-clusters --active
From its output you can get the cluster IDs and names. As a cluster ID and IP can change over time, its name is usually permanent (like
Adhoc-Analyticscluster) so it can be useful for various aggregation reports.
Read on to see what you can do with this list of clusters.