Database Reference
In-Depth Information
lete does not actually remove the data immediately. There's a simple reason for this: Cassandra's
durable, eventually consistent, distributed design. If Cassandra had a straightforward design for
deletes and a node goes down, that node would therefore not receive the delete. Once that node
comes back online, it would mistakenly think that all of the nodes that had received the delete
had actually missed a write (the data that it still has because it missed the delete), and it would
start repairing all of the other nodes. So Cassandra needs a more sophisticated mechanism to
support deletes. That mechanism is called a tombstone.
A tombstone is a special marker issued in a delete that overwrites the deleted values, acting as a
placeholder. If any replica did not receive the delete operation, the tombstone can later be propag-
ated to those replicas when they are available again. The net effect of this design is that your
data store will not immediately shrink in size following a delete. Each node keeps track of the
age of all its tombstones. Once they reach the age as configured in gc_grace_seconds (which
is 10 days by default), then a compaction is run, the tombstones are garbage-collected, and the
corresponding disk space is recovered.
NOTE
Remember that SSTables are immutable, so the data is not deleted from the SSTable. On compaction,
tombstones are accounted for, merged data is sorted, a new index is created over the sorted data, and
the freshly merged, sorted, and indexed data is written to a single new file.
The assumption is that 10 days is plenty of time for you to bring a failed node back online before
compaction runs. If you feel comfortable doing so, you can reduce that grace period to reclaim
disk space more quickly.
Let's run an example that will delete some data that we previously inserted. Note that there is
no “delete” operation in Cassandra, it's remove , and there's really no “remove,” it's just a write
(of a tombstone flag). Because a remove operation is really a tombstone write, you still have to
supply a timestamp with the operation, because if there are multiple clients writing, the highest
timestamp wins—and those writes might include a tombstone or a new value. Cassandra doesn't
discriminate here; whichever operation has the highest timestamp will win.
A simple delete looks like this:
Connector conn = new Connector();
Cassandra.Client client = conn.connect();
String columnFamily = "Standard1";
byte[] key = "k2".getBytes(); //this is the row key
Clock clock = new Clock(System.currentTimeMillis());
Search WWH ::




Custom Search