Database Reference
In-Depth Information
and a relational database, this section presents considerable details about the
implementation and use of HBase.
The HBase design is based on Google's 2006 paper on Bigtable. This paper
described Bigtable as a “distributed storage system for managing structured data.”
Google used Bigtable to store Google product-specific data for sites such as Google
Earth, which provides satellite images of the world. Bigtable was also used to
store web crawler results, data for personalized search optimization, and website
clickstream data. Bigtable was built on top of the Google File System. MapReduce
was also utilized to process data into or out of a Bigtable. For example, the raw
clickstream data was stored in a Bigtable. Periodically, a scheduled MapReduce job
would run that would process and summarize the newly added clickstream data
and append the results to a second Bigtable [27].
The development of HBase began in 2006. HBase was included as part of a Hadoop
distribution at the end of 2007. In May 2010, HBase became an Apache Top
Level Project. Later in 2010, Facebook began to use HBase for its user messaging
infrastructure, which accommodated 350 million users sending 15 billion messages
per month [28].
HBase Architecture and Data Model
HBase is a data store that is intended to be distributed across a cluster of nodes.
Like Hadoop and many of its related Apache projects, HBase is built upon HDFS
and achieves its real-time access speeds by sharing the workload over a large
number of nodes in a distributed cluster. An HBase table consists of rows and
columns. However, an HBase table also has a third dimension, version, to maintain
the different values of a row and column intersection over time.
To illustrate this third dimension, a simple example would be that for any given
online customer, several shipping addresses could be stored. So, the row would
be indicated by a customer number. One column would provide the shipping
address. The value of the shipping address would be added at the intersection of
the customer number and the shipping address column, along with a timestamp
corresponding to when the customer last used this shipping address.
During a customer's checkout process from an online retailer, a website might use
such a table to retrieve and display the customer's previous shipping addresses. As
shown in Figure 10.6 , the customer can then select the appropriate address, add a
new address, or delete any addresses that are no longer relevant.
Search WWH ::




Custom Search