Database Reference
In-Depth Information
Stumbling on tombstones
We've seen that tombstones are critical to ensure that Cassandra can correctly identify that
a piece of data has been deleted in a distributed environment, but tombstones also have a
downside. Since tombstones are stored in place of the deleted values, they continue to oc-
cupy space in the range of clustering columns in a given partition. In some situations, this
can lead to unexpected performance degradation and even errors.
To use a somewhat artificial illustration, let's say that alice is now a long-time MyStatus
user, and has created tens of thousands of status updates. Let's also assume that alice de-
cides one day that she wants to delete 1,000 recent status updates she's created. Once she's
done with that process, we have stored a thousand tombstones, each of which is stored at
the ID of the deleted status update in the table's data structure.
The next time someone wants to read alice 's user timeline, we're going to run into a
slight problem. If we ask for 20 most recent status updates in her timeline, Cassandra will
dutifully scan the 20 most recent IDs in alice 's partition. But it will find that these are all
tombstones! Luckily, Cassandra is smart enough to keep looking; it won't simply return an
empty result set. But it will have to scan a thousand tombstones before it finds any non-de-
leted data. This scan carries a substantial performance overhead when compared to the
same operation with no tombstones getting in the way.
For that reason, it's best to avoid situations in which your application will need to scan over
clustering column ranges containing lots of tombstones.
Note
Tombstones do not live forever. Instead, they are automatically cleared by Cassandra's
compaction process after a configured amount of time has elapsed. By default, that dura-
tion is ten days. This means that a node may rejoin the cluster after up to ten days of down-
time without a risk of deleted records springing back to life. From our perspective as ap-
plication developers, the relatively long life of tombstones means that we are best off think-
ing of them as permanent, and strenuously avoiding situations where many tombstone re-
cords must be scanned in order to service a query.
Search WWH ::




Custom Search