Database Reference
In-Depth Information
So what does it mean in practical terms to support only two of the three facets of CAP?
CA
To primarily support Consistency and Availability means that you're likely using two-phase
commit for distributed transactions. It means that the system will block when a network par-
tition occurs, so it may be that your system is limited to a single data center cluster in an
attempt to mitigate this. If your application needs only this level of scale, this is easy to man-
age and allows you to rely on familiar, simple structures.
CP
To primarily support Consistency and Partition Tolerance, you may try to advance your ar-
chitecture by setting up data shards in order to scale. Your data will be consistent, but you
still run the risk of some data becoming unavailable if nodes fail.
AP
To primarily support Availability and Partition Tolerance, your system may return inaccurate
data, but the system will always be available, even in the face of network partitioning. DNS
is perhaps the most popular example of a system that is massively scalable, highly available,
and partition-tolerant.
NOTE
Note that this depiction is intended to offer an overview that helps draw distinctions between the broader
contours in these systems; it is not strictly precise. For example, it's not entirely clear where Google's
Bigtable should be placed on such a continuum. The Google paper describes Bigtable as “highly avail-
able,” but later goes on to say that if Chubby (the Bigtable persistent lock service) “becomes unavailable
for an extended period of time [caused by Chubby outages or network issues], Bigtable becomes un-
available” (section 4). On the matter of data reads, the paper says that “we do not consider the possibil-
ity of multiple copies of the same data, possibly in alternate forms due to views or indices.” Finally, the
paper indicates that “centralized control and Byzantine fault tolerance are not Bigtable goals” (section
10). Given such variable information, you can see that determining where a database falls on this sliding
scale is not an exact science.
Row-Oriented
Cassandra is frequently referred to as a “column-oriented” database, which is not incorrect. It's
not relational, and it does represent its data structures in sparse multidimensional hashtables.
“Sparse” means that for any given row you can have one or more columns, but each row doesn't
need to have all the same columns as other rows like it (as in a relational model). Each row has
a unique key, which makes its data accessible. So although it's not wrong to say that Cassandra
Search WWH ::




Custom Search