Reading and Writing Data - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

lete does not actually remove the data immediately. There's a simple reason for this: Cassandra's

durable, eventually consistent, distributed design. If Cassandra had a straightforward design for

deletes and a node goes down, that node would therefore not receive the delete. Once that node

comes back online, it would mistakenly think that all of the nodes that had received the delete

had actually missed a write (the data that it still has because it missed the delete), and it would

start repairing all of the other nodes. So Cassandra needs a more sophisticated mechanism to

support deletes. That mechanism is called a tombstone.

A tombstone is a special marker issued in a delete that overwrites the deleted values, acting as a

placeholder. If any replica did not receive the delete operation, the tombstone can later be propag-

ated to those replicas when they are available again. The net effect of this design is that your

data store will not immediately shrink in size following a delete. Each node keeps track of the

age of all its tombstones. Once they reach the age as configured in gc_grace_seconds (which

is 10 days by default), then a compaction is run, the tombstones are garbage-collected, and the

corresponding disk space is recovered.

NOTE

Remember that SSTables are immutable, so the data is not deleted from the SSTable. On compaction,

tombstones are accounted for, merged data is sorted, a new index is created over the sorted data, and

the freshly merged, sorted, and indexed data is written to a single new file.

The assumption is that 10 days is plenty of time for you to bring a failed node back online before

compaction runs. If you feel comfortable doing so, you can reduce that grace period to reclaim

disk space more quickly.

Let's run an example that will delete some data that we previously inserted. Note that there is

no “delete” operation in Cassandra, it's remove , and there's really no “remove,” it's just a write

(of a tombstone flag). Because a remove operation is really a tombstone write, you still have to

supply a timestamp with the operation, because if there are multiple clients writing, the highest

timestamp wins—and those writes might include a tombstone or a new value. Cassandra doesn't

discriminate here; whichever operation has the highest timestamp will win.

A simple delete looks like this:

Connector conn = new Connector();

Cassandra.Client client = conn.connect();

String columnFamily = "Standard1";

byte[] key = "k2".getBytes(); //this is the row key

Clock clock = new Clock(System.currentTimeMillis());

Search WWH ::

Custom Search

Home