Database Reference
In-Depth Information
Once the region exceeds the maximum conigured region size, it splits and a
matching split directory is created within the region directory. This size is conigured
using the hbase.hregion.max.filesize property or the coniguration is done at
the column-family level using the HColumnDescriptor instance.
In the case of multiple lushes by the MemStore, the number of iles might get
increased on this disk. The compaction process running in the background combines
the iles to the largest conigured ile size and also triggers region split.
Data replication
Data replication is copying data from one cluster to another cluster by replicating
the writes as the irst cluster received it. Intercluster (geographically apart as well)
replication in HBase is achieved by log shipping asynchronously. Data replication
serves as a disaster recovery solution and also provides higher availability at the
HBase layer.
The master-push pattern used by HBase replication keeps track of what is currently
being replicated as each region server has its own write-ahead log. One master
cluster can replicate any number of slave clusters. Each region server will participate
to replicate its own batch (the default size is 64 MB) of write-ahead edit records
contained within WAL.
The master-push pattern used for cluster replication can be designed in three
different ways:
Master-slave replication : In this type of replication, all the writes go to the
primary cluster (master) irst and then are replicated to the secondary cluster
(slave). This type of enforcement is done at an application level as HBase
does not ensure such replication. In case the application writes the data
to a secondary cluster, data never gets replicated to the master cluster.
HBase Cluster1
HBase Cluster2
HDFS
HDFS
Region Server
Region Server
HDFS
HDFS
Region Server
HDFS
Region Server
HDFS
ZooKeeper
ZooKeeper
HDFS
HDFS
Region Server
Region Server
Master
Master
Search WWH ::




Custom Search