Database Reference
In-Depth Information
Distributed deletion
When we want to delete data from storage, we might assume that Cassandra simply re-
moves the data from disk and forgets that it ever existed. In a non-distributed environment,
this approach to deletion would be entirely sufficient, but deletion is a bit more complex in
a distributed database like Cassandra. To find out why, let's return to, and modify, our pre-
vious scenario with Heather and Charles making concurrent modifications to HappyCorp's
user record.
In our modified scenario, Heather will still be updating the location column to contain
New York , but Charles will be attempting to delete the contents of that column altogether.
As with the original scenario, they'll be making their respective changes at roughly the
same time, and our application will issue the UPDATE and DELETE queries at the ONE
consistency level. Consider the following sequence of events:
1. Heather issues a request to update the location value to New York ; this is ac-
knowledged by Replica 1.
2. Charles issues a request to delete the contents of the location column; this is
acknowledged by Replica 2.
Given that Charles's request to delete the location happened just after Heather's request to
update it, the correct state of the location column is to contain no data. But with a naïve
approach to deletion, that won't be the outcome. Let's consider what will happen if we read
the HappyCorp user record at the ALL consistency just after Charles and Heather's requests
complete, but before the requests have propagated to replicas other than the ones that re-
spectively acknowledge them.
When the coordinator reads all copies of the HappyCorp user record, it will see one version
with a location field containing New York , one version with no data in the loca-
tion column, and one version with the old location prior to either of their updates. Ob-
serving that the version with the most recent timestamp contains New York , it will de-
termine that this is the up-to-date value of the location field. This is a violation of im-
mediate consistency, however, since the last change to the location column was the de-
letion, the ALL -consistency read should reflect that fact.
To deal with scenarios like this, Cassandra does not completely forget that a value ever ex-
isted when it is deleted. Instead, it stores a tombstone in place of the deleted value; that
Search WWH ::




Custom Search