Database Reference
In-Depth Information
As noted previously, row keys are sorted, so finding the region that hosts a particular row
is a matter of a lookup to find the largest entry whose key is less than or equal to that of
the requested row key. As regions transition — are split, disabled, enabled, deleted, or re-
deployed by the region load balancer, or redeployed due to a regionserver crash — the
catalog table is updated so the state of all regions on the cluster is kept current.
Fresh clients connect to the ZooKeeper cluster first to learn the location of
hbase:meta . The client then does a lookup against the appropriate hbase:meta re-
gion to figure out the hosting user-space region and its location. Thereafter, the client in-
teracts directly with the hosting regionserver.
To save on having to make three round-trips per row operation, clients cache all they learn
while doing lookups for hbase:meta . They cache locations as well as user-space region
start and stop rows, so they can figure out hosting regions themselves without having to
go back to the hbase:meta table. Clients continue to use the cached entries as they
work, until there is a fault. When this happens — i.e., when the region has moved — the
client consults the hbase:meta table again to learn the new location. If the consulted
hbase:meta region has moved, then ZooKeeper is reconsulted.
Writes arriving at a regionserver are first appended to a commit log and then added to an
in-memory memstore . When a memstore fills, its content is flushed to the filesystem.
The commit log is hosted on HDFS, so it remains available through a regionserver crash.
When the master notices that a regionserver is no longer reachable, usually because the
server's znode has expired in ZooKeeper, it splits the dead regionserver's commit log by
region. On reassignment and before they reopen for business, regions that were on the
dead regionserver will pick up their just-split files of not-yet-persisted edits and replay
them to bring themselves up to date with the state they had just before the failure.
When reading, the region's memstore is consulted first. If sufficient versions are found
reading memstore alone, the query completes there. Otherwise, flush files are consulted in
order, from newest to oldest, either until versions sufficient to satisfy the query are found
or until we run out of flush files.
A background process compacts flush files once their number has exceeded a threshold,
rewriting many files as one, because the fewer files a read consults, the more performant it
will be. On compaction, the process cleans out versions beyond the schema-configured
maximum and removes deleted and expired cells. A separate process running in the re-
gionserver monitors flush file sizes, splitting the region when they grow in excess of the
configured maximum.
Search WWH ::




Custom Search