Database Reference
In-Depth Information
HBase is a NoSQL database system included in the standard Hadoop distributions. It is a
key-value store, logically. This means that rows are defined by a key, and have associated
with them a number of bins (or columns) where the associated values are stored. The only
data type is the byte string. Physically, groups of similar columns are stored together in
column families. Most often, HBase is accessed via Java code, but APIs exist for using
HBase with Pig, Thrift, Jython (Python based), and others. HBase is not normally accessed
in a MapReduce fashion. It does have a shell interface for interactive use.
HBase is often used for applications that may require sparse rows. That is, each row may use
only a few of the defined columns. It is fast (as Hadoop goes) when access to elements is
done through the primary key, or defining key value. It's highly scalable and reasonably fast.
Unlike traditional HDFS applications, it permits random access to rows, rather than sequen-
tial searches.
Though faster than MapReduce, you should not use HBase for any kind of transactional
needs, nor any kind of relational analytics. It does not support any secondary indexes, so
finding all rows where a given column has a specific value is tedious and must be done at the
application level. HBase does not have a JOIN operation; this must be done by the individual
application. You must provide security at the application level; other tools like Accumulo
(described here ) are built with security in mind.
While Cassandra (described here ) and MongoDB (described here ) might still be the predom-
inant NoSQL databases today, HBase is gaining in popularity and may well be the leader in
the near future.
Tutorial Links
The folks at Coreservlets.com have put together a handful of Hadoop tutorials including an
excellent series on HBase . There's also a handful of video tutorials available on the Internet,
including this one , which we found particularly helpful.
Example Code
In this example, your goal is to find the average review for the movie Dune . Each movie re-
view has three elements: a reviewer name, a film title, and a rating (an integer from 0 to 10).
The example is done in the HBase shell:
hbase(main):008:0> create 'reviews', 'cf1'
0 row(s) in 1.0710 seconds
Search WWH ::




Custom Search