First I noticed that the job used only 100 containers i.e. just one container per cluster node. This was very suspicious as Hive uses the Apache Tez execution engine that can run concurrently only one task in a container.
Looking at the Hive script I found:
set hive.tez.container.size = 10240; -- 10 GB
Looks like someone had a memory problem with this query before and wanted to solve it once and forever!
Read on to see why this was not a great idea.