WebHCat

Kevin Feasel

2017-03-08

Hadoop

Jiang Mouren has a two-parter on WebHCat.  First, how it works:

SSH shell/Oozie hive action directly interact with YARN for HIVE execution where as Program using HdInsight Jobs SDK/ADF (Azure Data Factory) uses WebHCat REST interface to submit the jobs.

WebHCat is a REST interface for remote jobs (Hive, Pig, Scoop, MapReduce) execution. WebHCat translates the job submission requests into YARN applications and reports the status based on the YARN application status. WebHCat results are coming from YARN and troubleshooting some of them needs to go to YARN.

Then, how to debug issues:

2.1.2. WebHCat times out

HDInsight Gateway times out responses which take longer than 2Minutes resulting in “502 BadGateway”. WebHCat queries YARN services for job status and if they take longer than the request might timeout.

When this happens collect the following logs for further investigation:

/var/log/webchat. Typical contents of directory will be like

  • webhcat.log is the log4j log to which server writes logs
  • webhcat-console.log is stdout of server is started.
  • webhcat-console-error.log is stderr of server process

NOTE: webhcat.log will roll-over daily hence files like webhcat.log.YYYY-MM-DD will also present. For logs to a specific time range make sure that appropriate file is selected.

Because HDInsight doesn’t support WebHDFS, WebHCat is the primary method for cluster access, so it’s good to know.

Related Posts

Hadoop 3.0 Is Coming

Alex Woodie reports that Hadoop 3.0 will likely drop before Christmas: After years of work, the Apache Hadoop community is now putting the finishing touches on a release candidate for Hadoop 3.0 and, barring any unforeseen occurrences, will deliver it by the middle of December, according to Vinod Kumar Vavilapalli, a committer on the Apache […]

Read More

Impala Now A Top-Level Project

Greg Rahn announces that Apache Impala is now a top-level project: Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop.  Impala […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031