Database Reference
In-Depth Information
Last-write-wins conflict resolution
Under the hood, every time a piece of data is written to Cassandra, a timestamp is attached.
Then, when Cassandra has to deal with conflicting data as in the scenario mentioned earli-
er, it simply chooses the data with the most recent timestamp. Even though the sequence of
writes we discussed earlier was concurrent from the perspective of the distributed database,
it's vanishingly unlikely that they were received by Cassandra at the exact same micro-
second. So, one of them will have the more recent timestamp, and that's the one that Cas-
sandra will return when we read the data back with the ALL consistency.
It's important to emphasize that each column has its own timestamp value; if we issue a
query that only updates the location field in HappyCorp's user record, the location
field will carry the timestamp of that operation, but the other fields in HappyCorp's record
will still carry the timestamps of whenever they were last updated.
The ability to discretely update individual columns in a row, and have the timestamp of the
last write associated with the column rather than the whole row, is why Cassandra is able to
use the relatively simplistic last-write-wins strategy for conflict resolution. If, in the previ-
ous scenario, Heather had updated HappyCorp's location, but Charles had updated the
email address, Cassandra would simply take the most recent version of each column, syn-
thesizing an up-to-date view of the row that does not yet exist on any individual replica.
Note
For an illuminating comparison of Cassandra's conflict resolution with strategies used by
other distributed databases, see the blog post Why Cassandra doesn't need vector clocks at
http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks .
Search WWH ::




Custom Search