Database Reference
In-Depth Information
Using HBase
Although HDFS is excellent at storing large amounts of data, and although
MapReduce jobs and tools such as Hive and Pig are well suited for reading
and aggregating large amounts of data, they are not very efficient when
it comes to individual record lookups or updating the data. This is where
HBase comes into play.
HBase is classified as a NoSQL database. Unlike traditional relational
databases like SQL Server or Oracle, NoSQL databases do not attempt to
provide ACID (atomicity, consistency, isolation, durability) transactional
reliability. Instead, they are tuned to handle large amounts of unstructured
data, providing fast key-based lookups and updates.
As mentioned previously, HBase is a key/value columnar storage system.
The key is what provides fast access to the value for retrieval and updating.
An HBase table consists of a set of pointers to the cell values. These pointers
are made up of a row key, a column key, and a version key. Using this type of
key structure, the values that make up tables and rows are stored in regions
across regional servers. As the data grows, the regions are automatically
split and redistributed. Because HBase uses HDFS as the storage layer, it
relies on it to supply services such as automatic replication and failover.
Because HBase relies so heavily on keys for its performance, it is a very
important consideration when defining tables. In the next section, you will
look at creating HBase tables and defining appropriate keys for the table.
Creating HBase Tables
Because the keys are so important when retrieving or updating data quickly,
it is the most important consideration when setting up an HBase table. The
creation of the keys depends a great deal on how the data gets accessed. If
data is accessed as a single-cell lookup, a randomized key structure works
best. If you retrieve data based on buckets (for example, logs from a certain
server), you should include this in the key. If you further look up values
based on log event type or date ranges, these should also be part of the
key. The order of the key attributes is important. If lookups are based
primarily on server and then on event type, the key should be made up of
Server-Event-Timestamp.
Another factor to consider when creating tables in HBase is normalization
versus denormalization. Because HBase does not support table joins, it is
Search WWH ::




Custom Search