Database Reference
In-Depth Information
on the assignment where the crashed server left off. At a minimum, when bootstrapping a
client connection to an HBase cluster, the client must be passed the location of the
ZooKeeper ensemble. Thereafter, the client navigates the ZooKeeper hierarchy to learn
cluster attributes such as server locations.
Regionserver worker nodes are listed in the HBase conf/regionservers file, as you would
list datanodes and node managers in the Hadoop etc/hadoop/slaves file. Start and stop
scripts are like those in Hadoop and use the same SSH-based mechanism for running re-
mote commands. A cluster's site-specific configuration is done in the HBase conf/hbase-
site.xml and conf/hbase-env.sh files, which have the same format as their equivalents in
the Hadoop parent project (see Chapter 10 ) .
NOTE
Where there is commonality to be found, whether in a service or type, HBase typically directly uses or
subclasses the parent Hadoop implementation. When this is not possible, HBase will follow the Hadoop
model where it can. For example, HBase uses the Hadoop configuration system, so configuration files
have the same format. What this means for you, the user, is that you can leverage any Hadoop familiarity
in your exploration of HBase. HBase deviates from this rule only when adding its specializations.
HBase persists data via the Hadoop filesystem API. Most people using HBase run it on
HDFS for storage, though by default, and unless told otherwise, HBase writes to the local
filesystem. The local filesystem is fine for experimenting with your initial HBase install,
but thereafter, the first configuration made in an HBase cluster usually involves pointing
HBase at the HDFS cluster it should use.
HBase in operation
Internally, HBase keeps a special catalog table named hbase:meta , within which it
maintains the current list, state, and locations of all user-space regions afloat on the
cluster. Entries in hbase:meta are keyed by region name, where a region name is made
up of the name of the table the region belongs to, the region's start row, its time of cre-
ation, and finally, an MD5 hash of all of these (i.e., a hash of table name, start row, and
creation timestamp). Here is an example region name for a region in the table TestT-
able whose start row is xyz :
TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece.
Commas delimit the table name, start row, and timestamp. The MD5 hash is surrounded
by a leading and trailing period.
Search WWH ::




Custom Search