Database Reference
In-Depth Information
Implementation language : Java
Distributed : Yes. You can run HBase in standalone, pseudodistributed, or fully distributed
mode. Pseudodistributed mode means that you have several instances of HBase, but they're
all running on the same host.
Storage : HBase provides Bigtable-like capabilities on top of the Hadoop File System.
Schema : HBase supports unstructured and partially structured data. To do so, data is organ-
ized into column families (a term that appears in discussions of Apache Cassandra). You ad-
dress an individual record, called a “cell” in HBase, with a combination of row key, column
family, cell qualifier, and timestamp. As opposed to RDBMS, in which you must define your
table well in advance, with HBase you can simply name a column family and then allow the
cell qualifiers to be determined at runtime. This lets you be very flexible and supports an
agile approach to development.
Client : You can interact with HBase via Thrift, a RESTful service gateway, Protobuf (see
“Additional Features” below), or an extensible JRuby shell.
Open source : Yes (Apache License)
Production use : HBase has been used at Adobe since 2008. It is also used at Twitter, Ma-
halo, StumbleUpon, Ning, Hulu, World Lingo, Detikcom in Indonesia, and Yahoo!.
Additional features : Because HBase is part of the Hadoop project, it features tight integ-
ration with Hadoop. There is a set of convenience classes that allow you to easily execute
MapReduce jobs using HBase as the backing data store.
HBase requires Zookeeper to run. Zookeeper, also part of the Hadoop project, is a centralized
service for maintaining configuration information and distributed synchronization across nodes
in a cluster. Although this does add an external dependency, it makes maintaining the cluster
easier and helps simplify the HBase core.
HBase allows you to use Google's Protobuf (Protocol Buffer) API as an alternative to XML.
Protobuf is a very efficient way of serializing data. It has the advantage of compacting the same
data two to three times smaller than XML, and of being 20-100 times faster to parse than XML
because of the way the protocol buffer encodes bytes on the wire. This can make working with
HBase very fast. Protobuf is used extensively within Google; they incorporate nearly 50,000 dif-
ferent message types into Protobuf across a wide variety of systems. Check out the Protobuf
Google code project at http://code.google.com/p/protobuf .
The database comes with a web console user interface to monitor and manage region servers and
master servers.
Search WWH ::




Custom Search