Introducing Cassandra - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

So what does it mean in practical terms to support only two of the three facets of CAP?

CA

To primarily support Consistency and Availability means that you're likely using two-phase

commit for distributed transactions. It means that the system will block when a network par-

tition occurs, so it may be that your system is limited to a single data center cluster in an

attempt to mitigate this. If your application needs only this level of scale, this is easy to man-

age and allows you to rely on familiar, simple structures.

CP

To primarily support Consistency and Partition Tolerance, you may try to advance your ar-

chitecture by setting up data shards in order to scale. Your data will be consistent, but you

still run the risk of some data becoming unavailable if nodes fail.

AP

To primarily support Availability and Partition Tolerance, your system may return inaccurate

data, but the system will always be available, even in the face of network partitioning. DNS

is perhaps the most popular example of a system that is massively scalable, highly available,

and partition-tolerant.

NOTE

Note that this depiction is intended to offer an overview that helps draw distinctions between the broader

contours in these systems; it is not strictly precise. For example, it's not entirely clear where Google's

Bigtable should be placed on such a continuum. The Google paper describes Bigtable as “highly avail-

able,” but later goes on to say that if Chubby (the Bigtable persistent lock service) “becomes unavailable

for an extended period of time [caused by Chubby outages or network issues], Bigtable becomes un-

available” (section 4). On the matter of data reads, the paper says that “we do not consider the possibil-

ity of multiple copies of the same data, possibly in alternate forms due to views or indices.” Finally, the

paper indicates that “centralized control and Byzantine fault tolerance are not Bigtable goals” (section

10). Given such variable information, you can see that determining where a database falls on this sliding

scale is not an exact science.

Row-Oriented

Cassandra is frequently referred to as a “column-oriented” database, which is not incorrect. It's

not relational, and it does represent its data structures in sparse multidimensional hashtables.

“Sparse” means that for any given row you can have one or more columns, but each row doesn't

need to have all the same columns as other rows like it (as in a relational model). Each row has

a unique key, which makes its data accessible. So although it's not wrong to say that Cassandra

Search WWH ::

Custom Search

Home