Database Reference
In-Depth Information
for general ad-hoc querying and analysis. In other words, it is important to know
how the HBase table will be used; this understanding of the table's usage helps to
optimally define the construction of the row and the table.
For example, if an HBase table is to store the content of e-mails, the row may be
constructed as the concatenation of an e-mail address and the date sent. Because
the HBase table will be stored based on the row, the retrieval of the e-mails by
a given e-mail address will be fairly efficient, but the retrieval of all e-mails in a
certain date range will take much longer. The later discussion on regions provides
more details on how data is stored in HBase.
A column in an HBase table is designated by the combination of the
column
family
and the
column qualifier
. The column family provides a high-level
grouping for the column qualifiers. In the earlier shipping address example, the
row could contain the
order_number
, and the order details could be stored
under the column family
orders
, using the column qualifiers such as
shipping_address
,
billing_address
,
order_date
. In HBase, a column
is specified as column family:column qualifier. In the example, the column
orders:shipping_address
refers to an order's shipping address.
A
cell
is the intersection of a row and a column in a table. The
version
, sometimes
called the
timestamp
, provides the ability to maintain different values for a cell's
contents in HBase. Although the user can define a custom value for the version
when writing an entry to the table, a typical HBase implementation uses HBase's
default, the current system time. In Java, this timestamp is obtained with
System
.getCurrentTimeMillis()
, the number of milliseconds since January 1, 1970.
Because it is likely that only the most recent version of a cell may be required,
the cells are stored in descending order of the version. If the application requires
the cells to be stored and retrieved in ascending order of their creation time, the
approach is to use
Long.MAX_VALUE - System.getCurrentTimeMillis()
in Java as the version number.
Long.MAX_VALUE
corresponds to the maximum
value that a long integer can be in Java. In this case, the storing and sorting is still
in descending order of the version values.
Key type
is used to identify whether a particular key corresponds to a write
operation to the HBase table or a delete operation from the table. Technically, a
delete from an HBase table is accomplished with a write to the table. The key type
indicates the purpose of the write. For deletes, a tombstone marker is written to the
table to indicate that all cell versions equal to or older than the specified timestamp