Database Reference
In-Depth Information
Snappy : This algorithm is developed by Google. In this algorithm, although
the compression quality is not that good, the compression speed is very high.
GZIP : Unlike the Snappy algorithm, this algorithm provides very good
quality of compression and compressed data takes less disk space but at the
cost of compression speed.
Once the compression algorithm is installed, the HBase region server should be
conigured to test the correct installation at the startup time.
Load balancing
A balancer is an in-built feature of the HBase master that runs based on the value
(the default is 5 minutes) provided for the hbase.balancer.period property. The
main job of a balancer is to equal out the number of assigned regions per region
server. It irst identiies the regions to be moved, and then it moves the regions to
the region server. The upper limit for how long a balancer can run is half of the
balancer period.
A balancer can be called from the HBase shell and using the API. It can also be
controlled using the balancer switch to change its status to enable and disable.
Splitting regions
In HBase, once the region reaches the conigured maximum size (the default is 1 GB),
it, by default, splits into two halves. These new parts start taking on more data and
grow as new regions. As a negative scenario to this, if multiple regions grow at the
same rate, they end up splitting at the same time, which can lead to extreme I/O.
Hence, it is recommended to run the region split manually, rather than using it as
a default feature. The manual approach provides better control over any available
region. To disable auto splitting, simply increase the maximum limit to an extent, say
100 GB, so that it does not trigger frequently and administrates a rebound to perform
splitting manually. The region size is deined in hbase-site.xml , as shown in the
following command:
<property>
<name>hbase.hregion.max.filesize</name>
<value>107374182400</value>
</property>.
Presplit regions can also be determined based on the largest store ile in the region,
and with a growing data size, this region will get larger and become the selected
candidate for region split.
 
Search WWH ::




Custom Search