Database Reference
In-Depth Information
Storage : Google File System (GFS). Files are divided into 64-megabyte chunks, and a typ-
ical write operation will only ever append to the files in order to provide maximum through-
put. GFS has as a driving principle that the filesystem must run on banks of inexpensive
commodity servers, which can be prone to failure, and therefore it must be able to manage
availability in such a scenario. Bigtable features two server types: one master node and many
chunkservers. The chunkservers store the data chunk files, and the master node stores all of
the metadata about the chunks, such as the location of some particular piece of data. This
is a clear point where Cassandra diverges from Bigtable's design, as Cassandra nodes are all
the same and there is no master server centrally controlling the ring.
Schema : The data model in Bigtable is a sparse, distributed, multidimensional sorted map.
It allows you to store data in a richer way than, say, Amazon SimpleDB, as you can use list
types. The map is indexed using a row key, a column key, and a timestamp; the values them-
selves are uninterpreted byte arrays.
Client : C++. Queries are also sometimes written in a scripting language developed at
Google called Sawzall. Initially, the Sawzall API did not support writing values to the data-
base, but did allow data filtering, transformation, and summarizing. MapReduce is typically
used as both an input source and an output source.
Open source : No
Additional features : While Bigtable itself is not directly available for your own use, you
can use it indirectly if you build an application with Google App Engine. Bigtable was de-
signed with use of the MapReduce algorithm in mind. There are a few clones of Bigtable,
and Hadoop is an open source implementation of MapReduce.
HBase
HBase is a clone of Google's Bigtable, originally created for use with Hadoop (it's actually a
subproject of the Apache Hadoop project). In the way that Google's Bigtable uses the Google
File System (GFS), HBase provides database capabilities for Hadoop, allowing you to use it as a
source or sink for MapReduce jobs. Unlike some other columnar databases that provide eventual
consistency, HBase is strongly consistent.
Perhaps it is interesting to note that Microsoft is a contributor to HBase, following their acquisi-
tion of Powerset.
Website : http://hbase.apache.org
Orientation : Columnar
Created : HBase was created at Powerset in 2007 and later donated to Apache.
Search WWH ::




Custom Search