How Cassandra Distributes Data - Learning Apache Cassandra

Database Reference

In-Depth Information

Last-write-wins conflict resolution

Under the hood, every time a piece of data is written to Cassandra, a timestamp is attached.

Then, when Cassandra has to deal with conflicting data as in the scenario mentioned earli-

er, it simply chooses the data with the most recent timestamp. Even though the sequence of

writes we discussed earlier was concurrent from the perspective of the distributed database,

it's vanishingly unlikely that they were received by Cassandra at the exact same micro-

second. So, one of them will have the more recent timestamp, and that's the one that Cas-

sandra will return when we read the data back with the ALL consistency.

It's important to emphasize that each column has its own timestamp value; if we issue a

query that only updates the location field in HappyCorp's user record, the location

field will carry the timestamp of that operation, but the other fields in HappyCorp's record

will still carry the timestamps of whenever they were last updated.

The ability to discretely update individual columns in a row, and have the timestamp of the

last write associated with the column rather than the whole row, is why Cassandra is able to

use the relatively simplistic last-write-wins strategy for conflict resolution. If, in the previ-

ous scenario, Heather had updated HappyCorp's location, but Charles had updated the

email address, Cassandra would simply take the most recent version of each column, syn-

thesizing an up-to-date view of the row that does not yet exist on any individual replica.

Note

For an illuminating comparison of Cassandra's conflict resolution with strategies used by

other distributed databases, see the blog post Why Cassandra doesn't need vector clocks at

Search WWH ::

Custom Search

Home