Press "Enter" to skip to content

Bucketing Data in Hive

Chitra Sapkal explains why bucketing in Hive can be so powerful:

When a column has a high cardinality, we can’t perform partitioning on it. A very high number of partitions will generate too many Hadoop files which would increase the load on the node. That’s because the node will have to keep the metadata of every partition, and that would affect the performance of that node

In simple words, You can use bucketing if you need to run queries on columns that have huge data, which makes it difficult to create partitions.

Click through to see how bucketing works and examples of how you can use it to make queries faster.