The Cassandra Architecture - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

value. A tombstone is a deletion marker that is required to suppress older data in SSTables until

compaction can run.

There's a related setting called Garbage Collection Grace Seconds. This is the amount of time

that the server will wait to garbage-collect a tombstone. By default, it's set to 864,000 seconds,

the equivalent of 10 days. Cassandra keeps track of tombstone age, and once a tombstone is older

than GCGraceSeconds , it will be garbage-collected. The purpose of this delay is to give a node

that is unavailable time to recover; if a node is down longer than this value, then it is treated as

failed and replaced.

As of 0.7, this setting is configurable per column family (it used to be for the whole keyspace).

Staged Event-Driven Architecture (SEDA)

Cassandra implements a Staged Event-Driven Architecture (SEDA). SEDA is a general architec-

ture for highly concurrent Internet services, originally proposed in a 2001 paper called “SEDA:

An Architecture for Well-Conditioned, Scalable Internet Services” by Matt Welsh, David Culler,

and Eric Brewer (who you might recall from our discussion of the CAP theorem).

NOTE

You can read the original SEDA paper at http://www.eecs.harvard.edu/~mdw/proj/seda .

In a typical application, a single unit of work is often performed within the confines of a

single thread. A write operation, for example, will start and end within the same thread. Cas-

sandra, however, is different: its concurrency model is based on SEDA, so a single operation

may start with one thread, which then hands off the work to another thread, which may hand

it off to other threads. But it's not up to the current thread to hand off the work to another

thread. Instead, work is subdivided into what are called stages, and the thread pool (really, a

java.util.concurrent.ExecutorService ) associated with the stage determines execution.

A stage is a basic unit of work, and a single operation may internally state-transition from one

stage to the next. Because each stage can be handled by a different thread pool, Cassandra ex-

periences a massive performance improvement. This SEDA design also means that Cassandra

is better able to manage its own resources internally because different operations might require

disk IO, or they might be CPU-bound, or they might be network operations, and so on, so the

pools can manage their work according to the availability of these resources.

A stage consists of an incoming event queue, an event handler, and an associated thread

pool. Stages are managed by a controller that determines scheduling and thread allocation;

Cassandra implements this kind of concurrency model using the thread pool

Search WWH ::

Custom Search

Home