Database Reference
In-Depth Information
on the assignment where the crashed server left off. At a minimum, when bootstrapping a
client connection to an HBase cluster, the client must be passed the location of the
ZooKeeper ensemble. Thereafter, the client navigates the ZooKeeper hierarchy to learn
cluster attributes such as server locations.
Regionserver worker nodes are listed in the HBase
conf/regionservers
file, as you would
list datanodes and node managers in the Hadoop
etc/hadoop/slaves
file. Start and stop
scripts are like those in Hadoop and use the same SSH-based mechanism for running re-
mote commands. A cluster's site-specific configuration is done in the HBase
conf/hbase-
site.xml
and
conf/hbase-env.sh
files, which have the same format as their equivalents in
the Hadoop parent project (see
Chapter 10
)
.
NOTE
Where there is commonality to be found, whether in a service or type, HBase typically directly uses or
subclasses the parent Hadoop implementation. When this is not possible, HBase will follow the Hadoop
model where it can. For example, HBase uses the Hadoop configuration system, so configuration files
have the same format. What this means for you, the user, is that you can leverage any Hadoop familiarity
in your exploration of HBase. HBase deviates from this rule only when adding its specializations.
HBase persists data via the Hadoop filesystem API. Most people using HBase run it on
HDFS for storage, though by default, and unless told otherwise, HBase writes to the local
filesystem. The local filesystem is fine for experimenting with your initial HBase install,
but thereafter, the first configuration made in an HBase cluster usually involves pointing
HBase at the HDFS cluster it should use.
HBase in operation
Internally, HBase keeps a special catalog table named
hbase:meta
, within which it
maintains the current list, state, and locations of all user-space regions afloat on the
cluster. Entries in
hbase:meta
are keyed by region name, where a region name is made
up of the name of the table the region belongs to, the region's start row, its time of cre-
ation, and finally, an MD5 hash of all of these (i.e., a hash of table name, start row, and
creation timestamp). Here is an example region name for a region in the table
TestT-
able
whose start row is
xyz
:
TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece.
Commas delimit the table name, start row, and timestamp. The MD5 hash is surrounded
by a leading and trailing period.