Database Reference
In-Depth Information
Distributed deletion
When we want to delete data from storage, we might assume that Cassandra simply re-
moves the data from disk and forgets that it ever existed. In a non-distributed environment,
this approach to deletion would be entirely sufficient, but deletion is a bit more complex in
a distributed database like Cassandra. To find out why, let's return to, and modify, our pre-
vious scenario with Heather and Charles making concurrent modifications to HappyCorp's
user record.
In our modified scenario, Heather will still be updating the
location
column to contain
New York
, but Charles will be attempting to delete the contents of that column altogether.
As with the original scenario, they'll be making their respective changes at roughly the
same time, and our application will issue the
UPDATE
and
DELETE
queries at the
ONE
consistency level. Consider the following sequence of events:
1. Heather issues a request to update the
location
value to
New York
; this is ac-
knowledged by Replica 1.
2. Charles issues a request to delete the contents of the
location
column; this is
acknowledged by Replica 2.
Given that Charles's request to delete the location happened just after Heather's request to
update it, the correct state of the
location
column is to contain no data. But with a naïve
approach to deletion, that won't be the outcome. Let's consider what will happen if we read
the HappyCorp user record at the
ALL
consistency just after Charles and Heather's requests
complete, but before the requests have propagated to replicas other than the ones that re-
spectively acknowledge them.
When the coordinator reads all copies of the HappyCorp user record, it will see one version
with a
location
field containing
New York
, one version with no data in the
loca-
tion
column, and one version with the old location prior to either of their updates. Ob-
serving that the version with the most recent timestamp contains
New York
, it will de-
termine that this is the up-to-date value of the
location
field. This is a violation of im-
mediate consistency, however, since the last change to the
location
column was the de-
letion, the
ALL
-consistency read should reflect that fact.
To deal with scenarios like this, Cassandra does not completely forget that a value ever ex-
isted when it is deleted. Instead, it stores a
tombstone
in place of the deleted value; that