Database Reference
In-Depth Information
effect of such a design is that during a failure, some of the data will be entirely unavailable. As
Amazon CTO Werner Vogels puts it, “rather than dealing with the uncertainty of the correctness
of an answer, the data is made unavailable until it is absolutely certain that it is correct” ("Dy-
namo: Amazon's Highly Distributed Key-Value Store”: [ http://www.allthingsdistributed.com/
2007/10/amazons_dynamo.html ], 207).
We could alternatively take an optimistic approach to replication, propagating updates to all rep-
licas in the background in order to avoid blowing up on the client. The difficulty this approach
presents is that now we are forced into the situation of detecting and resolving conflicts. A design
approach must decide whether to resolve these conflicts at one of two possible times: during
reads or during writes. That is, a distributed database designer must choose to make the system
either always readable or always writable.
Dynamo and Cassandra choose to be always writable, opting to defer the complexity of reconcili-
ation to read operations, and realize tremendous performance gains. The alternative is to reject
updates amidst network and server failures.
In Cassandra, consistency is not an all-or-nothing proposition, so we might more accurately term
it “tuneable consistency” because the client can control the number of replicas to block on for all
updates. This is done by setting the consistency level against the replication factor.
The replicationfactorlets you decide how much you want to pay in performance to gain more
consistency. You set the replication factor to the number of nodes in the cluster you want the
updates to propagate to (remember that an update means any add, update, or delete operation).
The consistencylevelis a setting that clients must specify on every operation and that allows you
to decide how many replicas in the cluster must acknowledge a write operation or respond to a
read operation in order to be considered successful. That's the part where Cassandra has pushed
the decision for determining consistency out to the client.
So if you like, you could set the consistency level to a number equal to the replication factor, and
gain stronger consistency at the cost of synchronous blocking operations that wait for all nodes
to be updated and declare success before returning. This is not often done in practice with Cas-
sandra, however, for reasons that should be clear (it defeats the availability goal, would impact
performance, and generally goes against the grain of why you'd want to use Cassandra in the
first place). So if the client sets the consistency level to a value less than the replication factor,
the update is considered successful even if some nodes are down.
Brewer's CAP Theorem
In order to understand Cassandra's design and its label as an “eventually consistent” database, we
need to understand the CAP theorem. The CAP theorem is sometimes called Brewer's theorem
after its author, Eric Brewer.
Search WWH ::




Custom Search