HBase Compaction

Kevin Feasel

2017-03-03

Hadoop

Jitendra Bafna explains how HBase compaction works:

Compaction is a process by which HBase cleans itself. It comes in two flavors: minor compaction and major compaction.

Minor compaction is the process of combining the configurable number of smaller HFiles into one Large HFile. Minor compaction is very important because without it, reading particular rows requires many disk reads and can reduce overall performance.

Major compaction is a process of combining the StoreFiles of regions into a single StoreFile. It also deletes remove and expired versions. By default, major compaction runs every 24 hours and merges all StoreFiles into single StoreFile. After compaction, if the new larger StoreFile is greater than a certain size (defined by property), the region will split into new regions.

Read on for more information about compaction and data locality, which is a totally different topic.

Related Posts

Flink’s State Processor API

Seth Wiesman and Fabian Hueske show off Apache Flink’s State Processor API: The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to […]

Read More

Derivative Event Sourcing

Anna McDonald explains the concept of derivative event sourcing: If you happen to be the proud owner of a single order service, then you are all set to begin. But what if you have more than one order service? Something that tends to happen at companies that have been around for more than a sprint […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031