Database Reference
In-Depth Information
Note
For more information, refer to The Log-Structured Merge-Tree (LSM-Tree) (1996) by Pat-
rick O'Neil and others at http://citeseerx.ist.psu.edu/viewdoc/sum-
mary?doi=10.1.1.44.2782 .
The preceding paper suggests multi-component LSM trees, where data from memory is
flushed into a smaller tree on disk for a quicker merge. When this tree fills up, it rolls
them into a bigger tree. So, if you have K trees with the first tree being the smallest and
the K th being the largest, the memory gets flushed into the first tree, which when full, per-
forms a rolling merge to the second tree, and so on. The change eventually lands up onto
the K th tree. This is a background process (similar to the compaction process in Cas-
sandra). Cassandra differs a little bit where memory resident data is flushed into immut-
able SSTables, which are eventually merged into one big SSTable by a background pro-
cess. Like any other disk-resident access tree, popular pages are buffered into memory for
faster access. Cassandra has a similar concept with key cache and row cache (optional)
mechanisms.
We'll see the LSM tree in action in the context of Cassandra in the next three sections.
Commit log
One of the promises that Cassandra makes to the end users is durability. In conventional
terms (or in ACID terminology), durability guarantees that a successful transaction (write,
update) will survive permanently. This means that once Cassandra says write suc-
cessful , it means the data is persisted and will survive system failures. This is done the
same way as in any DBMS that guarantees durability: by writing the replayable informa-
tion to a file before responding to a successful write. This log is called the commit log in
the Cassandra realm.
This is what happens under the hood: any write to a node gets tracked by
org.apache.cassandra.db.commitlog.CommitLog , which writes the data
with certain metadata into the commit log file in such a manner that replaying this will re-
create the data. The purpose of this exercise is to ensure there is no data loss. If, due to
some reason, the data cannot make it into MemTable or SSTable, the system can replay
the commit log to recreate the data.
Commit log, MemTable, and SSTable in a node are tightly coupled. Any write operation
gets written to the commit log first and then the MemTable gets updated. MemTable,
Search WWH ::




Custom Search