Press "Enter" to skip to content

Order, Sort, Cluster, and Distribute in Hive

The Hadoop in Real World team give us three methods (and one synonym) to organize results in Hive:

Hive provides 3 options to order or sort the result of records – order by, sort by, cluster by and distribute by. Which option you choose has performance implications. So it is important to understand the difference between the options and choose the right one for the use case at hand.

Click through for a high-level overview of the techniques.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.