The Nonrelational Landscape - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

▪ Implementation language : Java

▪ Distributed : Yes. You can run HBase in standalone, pseudodistributed, or fully distributed

mode. Pseudodistributed mode means that you have several instances of HBase, but they're

all running on the same host.

▪ Storage : HBase provides Bigtable-like capabilities on top of the Hadoop File System.

▪ Schema : HBase supports unstructured and partially structured data. To do so, data is organ-

ized into column families (a term that appears in discussions of Apache Cassandra). You ad-

dress an individual record, called a “cell” in HBase, with a combination of row key, column

family, cell qualifier, and timestamp. As opposed to RDBMS, in which you must define your

table well in advance, with HBase you can simply name a column family and then allow the

cell qualifiers to be determined at runtime. This lets you be very flexible and supports an

agile approach to development.

▪ Client : You can interact with HBase via Thrift, a RESTful service gateway, Protobuf (see

“Additional Features” below), or an extensible JRuby shell.

▪ Open source : Yes (Apache License)

▪ Production use : HBase has been used at Adobe since 2008. It is also used at Twitter, Ma-

halo, StumbleUpon, Ning, Hulu, World Lingo, Detikcom in Indonesia, and Yahoo!.

▪ Additional features : Because HBase is part of the Hadoop project, it features tight integ-

ration with Hadoop. There is a set of convenience classes that allow you to easily execute

MapReduce jobs using HBase as the backing data store.

HBase requires Zookeeper to run. Zookeeper, also part of the Hadoop project, is a centralized

service for maintaining configuration information and distributed synchronization across nodes

in a cluster. Although this does add an external dependency, it makes maintaining the cluster

easier and helps simplify the HBase core.

HBase allows you to use Google's Protobuf (Protocol Buffer) API as an alternative to XML.

Protobuf is a very efficient way of serializing data. It has the advantage of compacting the same

data two to three times smaller than XML, and of being 20-100 times faster to parse than XML

because of the way the protocol buffer encodes bytes on the wire. This can make working with

HBase very fast. Protobuf is used extensively within Google; they incorporate nearly 50,000 dif-

ferent message types into Protobuf across a wide variety of systems. Check out the Protobuf

Google code project at http://code.google.com/p/protobuf .

The database comes with a web console user interface to monitor and manage region servers and

master servers.

Search WWH ::

Custom Search

Home