The Nonrelational Landscape - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

within Google as the underlying data store, supporting more than 60 projects, including Gmail,

YouTube, Google Analytics, Google Finance, Orkut, Personalized Search, and Google Earth.

Bigtable runs on top of the Google File System (GFS).

It is useful to understand Bigtable, at least to a certain degree, because many of its attributes and

design decisions are explicitly copied in Cassandra. Although Cassandra gets its design for con-

sistency and partition tolerance from Amazon Dynamo, Cassandra's data model is based more

closely on Bigtable's. For example, Cassandra borrows from Bigtable (sometimes with modifica-

tion) the implementation of SSTables, memtables, Bloom filters, and compactions (see the Gloss-

ary for deinitions of these terms; they are explored in detail elsewhere in this topic as appro-

priate). In this way, Cassandra supports a somewhat richer data model than Dynamo, something

more flexible and layered than a simple key-value store, as it supports sparse, semistructured data.

NOTE

I very much encourage you to read the Google Bigtable paper; it's an excellent read. However, keep in

mind that although Cassandra borrows many key ideas from Bigtable, it is not generally a 1:1 corres-

pondence in ideas or implementation. For example, Bigtable defines master and slave nodes, and while

Cassandra's data model and storage mechanism are based on Bigtable and use the same terminology in

many places, it's not always the case. For example, Bigtable reads and writes are close but not identical

to their Cassandra implementations; Bigtable defines a Tablet structure that is not strictly present in Cas-

sandra; and so on. You can read the paper at http://labs.google.com/papers/bigtable.html .

Cassandra does contrast with Bigtable in several areas, however, not least of which is that Cas-

sandra maintains a decentralized model. In Bigtable there is a master server that controls opera-

tions using the Chubby persistent distributed locking mechanism; in Cassandra, all the nodes are

on even par with no centralized control, and they communicate using a gossip model.

Bigtable relies on a distributed lock service called Chubby for several different things: ensuring

that there is at most a single master replica at any given time; managing server bootstrapping,

discovery, and death; and storing the schema information.

▪ Website : None, but you might be interested in a related project called Google Fusion Tables,

which is available at http://tables.googlelabs.com .

▪ Orientation : Columnar

▪ Created : By Google, Inc. Development started in 2004, with the paper published in 2006.

▪ Implementation language : C++

▪ Distributed : Yes

Search WWH ::

Custom Search

Home