Database Reference
In-Depth Information
Note
Major compaction may not be the best idea after Cassandra v0.8+. There are a couple of
reasons for this. One reason is that automated minor compaction no longer runs after a
major compaction is executed. So, this adds up manual intervention or doing extra work
(such as setting a cron job) to perform regular major compaction. The performance gain
after major compaction may deteriorate with time. Probably because of the larger the
SSTable, which is what we get after major compaction, it is more likely to get more
bloom filter false positive. And then, it will take longer to perform binary search on the
index, which is very big.
Tombstones
Cassandra is a complex system with its data distributed among commit logs, MemTables,
and SSTables on a node. The same data is then replicated over replica nodes. So, like
everything else in Cassandra, deletion is going to be eventful. Deletion, to an extent, fol-
lows an update pattern, except Cassandra tags the deleted data with a special value, and
marks it as a tombstone. This marker helps future queries, compaction, and conflict resol-
ution. Let's step further down and see what happens when a column from a column family
is deleted.
A client connected to a node (a coordinator node may not be the one holding the data that
we are going to mutate), issues a delete command for a column C, in a column family CF.
If the consistency level is satisfied, the delete command gets processed. When a node,
containing the row key receives a delete request, it updates or inserts the column in
MemTable with a special value, namely tombstone. The tombstone basically has the same
column name as the previous one; the value is set to the Unix epoch. The timestamp is set
to what the client has passed. When a MemTable is flushed to SSTable, all tombstones go
into it as any regular column will.
On the read side, when the data is read locally on the node and it happens to have multiple
versions of it in different SSTables, they are compared and the latest value is taken as the
result of reconciliation. If a tombstone turns out to be a result of reconciliation, it is made
a part of the result that this node returns. So, at this level, if a query has a deleted column,
this exists in the result. But the tombstones will eventually be filtered out of the result be-
fore returning it back to the client. So, a client can never see a value that is a tombstone.
For consistency levels more than one, the query is executed on as many replicas as the
consistency level. The same as a regular read process, data from the closest node and a di-
Search WWH ::




Custom Search