Cassandra Architecture - Mastering Apache Cassandra

Database Reference

In-Depth Information

Note

For more information, refer to The Log-Structured Merge-Tree (LSM-Tree) (1996) by Pat-

rick O'Neil and others at http://citeseerx.ist.psu.edu/viewdoc/sum-

mary?doi=10.1.1.44.2782 .

The preceding paper suggests multi-component LSM trees, where data from memory is

flushed into a smaller tree on disk for a quicker merge. When this tree fills up, it rolls

them into a bigger tree. So, if you have K trees with the first tree being the smallest and

the K th being the largest, the memory gets flushed into the first tree, which when full, per-

forms a rolling merge to the second tree, and so on. The change eventually lands up onto

the K th tree. This is a background process (similar to the compaction process in Cas-

sandra). Cassandra differs a little bit where memory resident data is flushed into immut-

able SSTables, which are eventually merged into one big SSTable by a background pro-

cess. Like any other disk-resident access tree, popular pages are buffered into memory for

faster access. Cassandra has a similar concept with key cache and row cache (optional)

mechanisms.

We'll see the LSM tree in action in the context of Cassandra in the next three sections.

Commit log

One of the promises that Cassandra makes to the end users is durability. In conventional

terms (or in ACID terminology), durability guarantees that a successful transaction (write,

update) will survive permanently. This means that once Cassandra says write suc-

cessful , it means the data is persisted and will survive system failures. This is done the

same way as in any DBMS that guarantees durability: by writing the replayable informa-

tion to a file before responding to a successful write. This log is called the commit log in

the Cassandra realm.

This is what happens under the hood: any write to a node gets tracked by

org.apache.cassandra.db.commitlog.CommitLog , which writes the data

with certain metadata into the commit log file in such a manner that replaying this will re-

create the data. The purpose of this exercise is to ensure there is no data loss. If, due to

some reason, the data cannot make it into MemTable or SSTable, the system can replay

the commit log to recreate the data.

Commit log, MemTable, and SSTable in a node are tightly coupled. Any write operation

gets written to the commit log first and then the MemTable gets updated. MemTable,

Search WWH ::

Custom Search

Home