The Cassandra Architecture - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

not make it to the in-memory store (the memtable, discussed in a moment), it will still be pos-

sible to recover the data.

After it's written to the commit log, the value is written to a memory-resident data structure

called the memtable. When the number of objects stored in the memtable reaches a threshold,

the contents of the memtable are flushed to disk in a file called an SSTable. A new memtable is

then created. This flushing is a nonblocking operation; multiple memtables may exist for a single

column family, one current and the rest waiting to be flushed. They typically should not have to

wait very long, as the node should flush them very quickly unless it is overloaded.

Each commit log maintains an internal bit flag to indicate whether it needs flushing. When a

write operation is first received, it is written to the commit log and its bit flag is set to 1 . There is

only one bit flag per column family, because only one commit log is ever being written to across

the entire server. All writes to all column families will go into the same commit log, so the bit

flag indicates whether a particular commit log contains anything that hasn't been flushed for a

particular column family. Once the memtable has been properly flushed to disk, the correspond-

ing commit log's bit flag is set to 0 , indicating that the commit log no longer has to maintain

that data for durability purposes. Like regular logfiles, commit logs have a configurable rollover

threshold, and once this file size threshold is reached, the log will roll over, carrying with it any

extant dirty bit flags.

The SSTable is a concept borrowed from Google's Bigtable. Once a memtable is flushed to disk

as an SSTable, it is immutable and cannot be changed by the application. Despite the fact that

SSTables are compacted, this compaction changes only their on-disk representation; it essentially

performs the “merge” step of a mergesort into new files and removes the old files on success.

NOTE

The idea that “SSTable” is a compaction of “Sorted String Table” is somewhat of a misnomer for Cas-

sandra, because the data is not stored as strings on disk.

Each SSTable also has an associated Bloom filter, which is used as an additional performance

enhancer (see Bloom Filters ).

All writes are sequential, which is the primary reason that writes perform so well in Cassandra.

No reads or seeks of any kind are required for writing a value to Cassandra because all writes are

append operations. This makes one key limitation on performance the speed of your disk. Com-

paction is intended to amortize the reorganization of data, but it uses sequential IO to do so. So

the performance benefit is gained by splitting; the write operation is just an immediate append,

and then compaction helps to organize for better future read performance. If Cassandra naively

inserted values where they ultimately belonged, writing clients would pay for seeks up front.

Search WWH ::

Custom Search

Home