HBase - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

As noted previously, row keys are sorted, so finding the region that hosts a particular row

is a matter of a lookup to find the largest entry whose key is less than or equal to that of

the requested row key. As regions transition — are split, disabled, enabled, deleted, or re-

deployed by the region load balancer, or redeployed due to a regionserver crash — the

catalog table is updated so the state of all regions on the cluster is kept current.

Fresh clients connect to the ZooKeeper cluster first to learn the location of

hbase:meta . The client then does a lookup against the appropriate hbase:meta re-

gion to figure out the hosting user-space region and its location. Thereafter, the client in-

teracts directly with the hosting regionserver.

To save on having to make three round-trips per row operation, clients cache all they learn

while doing lookups for hbase:meta . They cache locations as well as user-space region

start and stop rows, so they can figure out hosting regions themselves without having to

go back to the hbase:meta table. Clients continue to use the cached entries as they

work, until there is a fault. When this happens — i.e., when the region has moved — the

client consults the hbase:meta table again to learn the new location. If the consulted

hbase:meta region has moved, then ZooKeeper is reconsulted.

Writes arriving at a regionserver are first appended to a commit log and then added to an

in-memory memstore . When a memstore fills, its content is flushed to the filesystem.

The commit log is hosted on HDFS, so it remains available through a regionserver crash.

When the master notices that a regionserver is no longer reachable, usually because the

server's znode has expired in ZooKeeper, it splits the dead regionserver's commit log by

region. On reassignment and before they reopen for business, regions that were on the

dead regionserver will pick up their just-split files of not-yet-persisted edits and replay

them to bring themselves up to date with the state they had just before the failure.

When reading, the region's memstore is consulted first. If sufficient versions are found

reading memstore alone, the query completes there. Otherwise, flush files are consulted in

order, from newest to oldest, either until versions sufficient to satisfy the query are found

or until we run out of flush files.

A background process compacts flush files once their number has exceeded a threshold,

rewriting many files as one, because the fewer files a read consults, the more performant it

will be. On compaction, the process cleans out versions beyond the schema-configured

maximum and removes deleted and expired cells. A separate process running in the re-

gionserver monitors flush file sizes, splitting the region when they grow in excess of the

configured maximum.

Search WWH ::

Custom Search

Home