Database Reference
In-Depth Information
for general ad-hoc querying and analysis. In other words, it is important to know
how the HBase table will be used; this understanding of the table's usage helps to
optimally define the construction of the row and the table.
For example, if an HBase table is to store the content of e-mails, the row may be
constructed as the concatenation of an e-mail address and the date sent. Because
the HBase table will be stored based on the row, the retrieval of the e-mails by
a given e-mail address will be fairly efficient, but the retrieval of all e-mails in a
certain date range will take much longer. The later discussion on regions provides
more details on how data is stored in HBase.
A column in an HBase table is designated by the combination of the column
family and the column qualifier . The column family provides a high-level
grouping for the column qualifiers. In the earlier shipping address example, the
row could contain the order_number , and the order details could be stored
under the column family orders , using the column qualifiers such as
shipping_address , billing_address , order_date . In HBase, a column
is specified as column family:column qualifier. In the example, the column
orders:shipping_address refers to an order's shipping address.
A cell is the intersection of a row and a column in a table. The version , sometimes
called the timestamp , provides the ability to maintain different values for a cell's
contents in HBase. Although the user can define a custom value for the version
when writing an entry to the table, a typical HBase implementation uses HBase's
default, the current system time. In Java, this timestamp is obtained with System
.getCurrentTimeMillis() , the number of milliseconds since January 1, 1970.
Because it is likely that only the most recent version of a cell may be required,
the cells are stored in descending order of the version. If the application requires
the cells to be stored and retrieved in ascending order of their creation time, the
approach is to use Long.MAX_VALUE - System.getCurrentTimeMillis()
in Java as the version number. Long.MAX_VALUE corresponds to the maximum
value that a long integer can be in Java. In this case, the storing and sorting is still
in descending order of the version values.
Key type is used to identify whether a particular key corresponds to a write
operation to the HBase table or a delete operation from the table. Technically, a
delete from an HBase table is accomplished with a write to the table. The key type
indicates the purpose of the write. For deletes, a tombstone marker is written to the
table to indicate that all cell versions equal to or older than the specified timestamp
Search WWH ::




Custom Search