HBase Compaction

Kevin Feasel

2017-03-03

Hadoop

Jitendra Bafna explains how HBase compaction works:

Compaction is a process by which HBase cleans itself. It comes in two flavors: minor compaction and major compaction.

Minor compaction is the process of combining the configurable number of smaller HFiles into one Large HFile. Minor compaction is very important because without it, reading particular rows requires many disk reads and can reduce overall performance.

Major compaction is a process of combining the StoreFiles of regions into a single StoreFile. It also deletes remove and expired versions. By default, major compaction runs every 24 hours and merges all StoreFiles into single StoreFile. After compaction, if the new larger StoreFile is greater than a certain size (defined by property), the region will split into new regions.

Read on for more information about compaction and data locality, which is a totally different topic.

Related Posts

Mounting HDFS As A Local Filesystem

Guy Shilo looks at two techniques for mounting HDFS as a local filesystem: NFS Gateway is a HDFS component that enables the use to expose HDFS through NFS3 interface so that Linux machines can mount it and access it just as a local filesystem. The manual installation is quite cumbersome and is covered here. Cloudera manager […]

Read More

How Humio Uses Kafka

Kresten Krab describes ways that Humio uses Apache Kafka for their product: Humio is a log analytics system built to run both on-prem and as a hosted offering. It is designed for “on-prem first” because, in many logging use cases, you need the privacy and security of managing your own logging solution. And because volume […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031