WebHCat

Kevin Feasel

2017-03-08

Hadoop

Jiang Mouren has a two-parter on WebHCat.  First, how it works:

SSH shell/Oozie hive action directly interact with YARN for HIVE execution where as Program using HdInsight Jobs SDK/ADF (Azure Data Factory) uses WebHCat REST interface to submit the jobs.

WebHCat is a REST interface for remote jobs (Hive, Pig, Scoop, MapReduce) execution. WebHCat translates the job submission requests into YARN applications and reports the status based on the YARN application status. WebHCat results are coming from YARN and troubleshooting some of them needs to go to YARN.

Then, how to debug issues:

2.1.2. WebHCat times out

HDInsight Gateway times out responses which take longer than 2Minutes resulting in “502 BadGateway”. WebHCat queries YARN services for job status and if they take longer than the request might timeout.

When this happens collect the following logs for further investigation:

/var/log/webchat. Typical contents of directory will be like

  • webhcat.log is the log4j log to which server writes logs
  • webhcat-console.log is stdout of server is started.
  • webhcat-console-error.log is stderr of server process

NOTE: webhcat.log will roll-over daily hence files like webhcat.log.YYYY-MM-DD will also present. For logs to a specific time range make sure that appropriate file is selected.

Because HDInsight doesn’t support WebHDFS, WebHCat is the primary method for cluster access, so it’s good to know.

Related Posts

Mounting HDFS As A Local Filesystem

Guy Shilo looks at two techniques for mounting HDFS as a local filesystem: NFS Gateway is a HDFS component that enables the use to expose HDFS through NFS3 interface so that Linux machines can mount it and access it just as a local filesystem. The manual installation is quite cumbersome and is covered here. Cloudera manager […]

Read More

How Humio Uses Kafka

Kresten Krab describes ways that Humio uses Apache Kafka for their product: Humio is a log analytics system built to run both on-prem and as a hosted offering. It is designed for “on-prem first” because, in many logging use cases, you need the privacy and security of managing your own logging solution. And because volume […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031