Database Reference
In-Depth Information
Handling conflicting data
As we explored above, Cassandra's masterless replication can lead to situations in which
multiple versions of the same record exist on different nodes. Since there is no master node
containing the canonical copy of a record, Cassandra must use other means to determine
which version of the data is correct.
This situation comes into play when reading data at any consistency level other than
ONE
.
When our application requests a row from Cassandra, we will receive a response with that
row's data; each column will contain one value. However, if we're reading at a consistency
level such as
QUORUM
or
ALL
, Cassandra internally will fetch the copies of the data from
multiple nodes; it's possible that the different copies will contain conflicting data. It's up to
Cassandra to figure out exactly what to return to us.
The problem is most acute when different clients are writing the same piece of data concur-
rently. Let's return to a scenario we explored in
Chapter 7
,
Expanding Your Data Model
:
two employees of HappyCorp, Heather and Charles, are simultaneously attempting to up-
date the
location
field in the user record of HappyCorp's shared account. Let's suppose
that we are writing data at consistency level
ONE
. This concurrent operation could be car-
ried out via the following sequence of events:
1. Heather updates the
location
to New York. The update is acknowledged by
Replica 1.
2. Charles updates the
location
to Palo Alto. The update is acknowledged by Rep-
lica 2.
Just after Heather and Charles's concurrent updates, the Replica 1 copy of the HappyCorp
user record will contain New York in its
location
field, and the Replica 2 copy will con-
tain Palo Alto. Now, before the updates have a chance to propagate to any nodes except the
ones that respectively acknowledged them, let's read the data back at the
ALL
consistency.
When Cassandra receives the read request, it will fetch HappyCorp's user record from Rep-
licas 1, 2, and 3. Each replica will contain a different version of the record: Replica 1's
copy has New York in the
location
field, Replica 2's has Palo Alto, and Replica 3's does
not contain anything in that field. So what location will Cassandra actually return to us?