The Nonrelational Landscape - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

▪ Storage : Google File System (GFS). Files are divided into 64-megabyte chunks, and a typ-

ical write operation will only ever append to the files in order to provide maximum through-

put. GFS has as a driving principle that the filesystem must run on banks of inexpensive

commodity servers, which can be prone to failure, and therefore it must be able to manage

availability in such a scenario. Bigtable features two server types: one master node and many

chunkservers. The chunkservers store the data chunk files, and the master node stores all of

the metadata about the chunks, such as the location of some particular piece of data. This

is a clear point where Cassandra diverges from Bigtable's design, as Cassandra nodes are all

the same and there is no master server centrally controlling the ring.

▪ Schema : The data model in Bigtable is a sparse, distributed, multidimensional sorted map.

It allows you to store data in a richer way than, say, Amazon SimpleDB, as you can use list

types. The map is indexed using a row key, a column key, and a timestamp; the values them-

selves are uninterpreted byte arrays.

▪ Client : C++. Queries are also sometimes written in a scripting language developed at

Google called Sawzall. Initially, the Sawzall API did not support writing values to the data-

base, but did allow data filtering, transformation, and summarizing. MapReduce is typically

used as both an input source and an output source.

▪ Open source : No

▪ Additional features : While Bigtable itself is not directly available for your own use, you

can use it indirectly if you build an application with Google App Engine. Bigtable was de-

signed with use of the MapReduce algorithm in mind. There are a few clones of Bigtable,

and Hadoop is an open source implementation of MapReduce.

HBase

HBase is a clone of Google's Bigtable, originally created for use with Hadoop (it's actually a

subproject of the Apache Hadoop project). In the way that Google's Bigtable uses the Google

File System (GFS), HBase provides database capabilities for Hadoop, allowing you to use it as a

source or sink for MapReduce jobs. Unlike some other columnar databases that provide eventual

consistency, HBase is strongly consistent.

Perhaps it is interesting to note that Microsoft is a contributor to HBase, following their acquisi-

tion of Powerset.

▪ Website : http://hbase.apache.org

▪ Orientation : Columnar

▪ Created : HBase was created at Powerset in 2007 and later donated to Apache.

Search WWH ::

Custom Search

Home