Press "Enter" to skip to content

Partitioning vs Bucketing in Hive

The Hadoop in Real World team explains the difference between partitioning and bucketing in Apache Hive tables:

Now let’s say you also filter the sales record by sku (stock-keeping unit aka. barcode)  in addition to sale_date and country. Creating a partition on sku will result in many partitions which is not ideal as it might result in uneven and smaller partitions.

Hadoop is not efficient in processing small volumes of data. There is a better way.

Read on to understand when each technique makes sense.