Database Reference
In-Depth Information
HFiles. The get command is instantaneously processed and the appropriate data
returned to the client.
Over time, as the smaller HFiles accumulate, the worker node runs a major
compaction that merges the smaller HFiles into one large HFile. During the
major compaction, the deleted entries and the tombstone markers are permanently
removed from the files.
Use Cases for HBase
As described in Google's Bigtable paper, a common use case for a data store such
as HBase is to store the results from a web crawler. Using this paper's example, the
row com.cnn.www , for example, corresponds to a website URL, www.cnn.com . A
column family, called anchor , is defined to capture the website URLs that provide
links to the row's website. What may not be an obvious implementation is that
those anchoring website URLs are used as the column qualifiers. For example, if
sportsillustrated . cnn.com provides a link to www.cnn.com , the column
qualifier is sportsillustrated.cnn .com . Additional websites that provide
links to www.cnn.com appear as additional column qualifiers. The value stored in
the cell is simply the text on the website that provides the link. Here is how the
CNN example may look in HBase following a get operation.
hbase> get 'web_table', 'com.cnn.www', {VERSIONS => 2}
COLUMN CELL
anchor:sportsillustrated.cnn.com timestamp=1380224620597,
value=cnn
anchor:sportsillustrated.cnn.com timestamp=1380224000001,
value=cnn.com
anchor:edition.cnn.com timestamp=1380224620597,
value=cnn
Additional results are returned for each corresponding website that provides a
link to www.cnn.com . Finally, an explanation is required for using com.cnn.www
for the row instead of www.cnn.com . By reversing the URLs, the various suffixes
( .com , .gov , or .net ) that correspond to the Internet's top-level domains are
stored in order. Also, the next part of the domain name ( cnn ) is stored in order.
So, all of the cnn.com websites could be retrieved by a scan with the STARTROW of
com.cnn and the appropriate STOPROW .
Search WWH ::




Custom Search