Database Reference
In-Depth Information
This simple use case illustrates several important points. First, it is possible to
get to a billion rows and millions of columns in an HBase table. As of February
2014, more than 920 million websites have been identified [32]. Second, the row
needs to be defined based on how the data will be accessed. An HBase table needs
to be designed with a specific purpose in mind and a well-reasoned plan for how
data will be read and written. Finally, it may be advantageous to use the column
qualifiers to actually store the data of interest, rather than simply storing it in a cell.
In the example, as new hosting websites are established, they become new column
qualifiers.
A second use case is the storage and search access of messages. In 2010, Facebook
implemented such a system using HBase. At the time, Facebook's system was
handling more than 15 billion user-to-user messages per month and 120 billion
chat messages per month [33]. The following describes Facebook's approach to
building a search index for user inboxes. Using each word in each user's message,
an HBase table was designed as follows:
• The row was defined to be the user ID.
• The column qualifier was set to a word that appears in the message.
• The version was the message ID.
• The cell's content was the offset of the word in the message.
This implementation allowed Facebook to provide auto-complete capability in the
search box and to return the results of the query quickly, with the most recent
messages at the top. As long as the message IDs increase over time, the versions,
stored in descending order, ensure that the most recent e-mails are returned first
to the user [34].
These two use cases help illustrate the importance of the upfront design of the
HBase table based on how the data will be accessed. Also, these examples illustrate
the power of being able to add new columns by adding new column qualifiers,
on demand. In a typical RDBMS implementation, new columns require the
involvement of a DBA to alter the structure of the table.
Other HBase Usage Considerations
In addition to the HBase design aspects presented in the use case discussions, the
following considerations are important for a successful implementation.
Java API: Previously, several HBase shell commands and operations
were presented. The shell commands are useful for exploring the data in
Search WWH ::




Custom Search