The Hadoop in Real World team give us three methods (and one synonym) to organize results in Hive:
Hive provides 3 options to order or sort the result of records – order by, sort by, cluster by and distribute by. Which option you choose has performance implications. So it is important to understand the difference between the options and choose the right one for the use case at hand.
Click through for a high-level overview of the techniques.