Database Reference
In-Depth Information
Glossary
I can't believe what a bunch of nerds we are. We're looking up “money laundering” in a dictionary.
—Peter, Office Space
This glossary provides definitions of some of the terms that are important to understand when
working with Apache Cassandra. There's some really good material at http://wiki.apache.org/
cassandra , but reading it for the first time can be tricky, as each new term seems to be explained
only with other new terms. Many of these concepts are daunting to beginning or even interme-
diate web developers or database administrators, so they're presented here in an easy referen-
ce. Much of the information in this glossary is repeated and expanded upon in relevant sections
throughout this topic.
Anttii--Enttropy
Anti-entropy, or replicasynchronization, is the mechanism in Cassandra for ensuring that
data on different nodes is updated to the newest version.
Here's how it works. During a major compaction (see Compaction ) , the server initiates a
TreeRequest/TreeResponse conversation to exchange Merkle trees with neighboring nodes.
The Merkle tree is a hash representing the data in that column family. If the trees from the
different nodes don't match, then they have to be reconciled (or “repaired”) in order to de-
termine the latest data values they should all be set to. This tree comparison validation is
the responsibility of the org.apache.cassandra.service.AntiEntropyService class.
AntiEntropyService implements the Singleton pattern and defines the static Differencer
class as well. This class is used to compare two trees, and if it finds any differences, it
launches a repair for the ranges that don't agree.
Anti-entropy is used in Amazon's Dynamo , and Cassandra's implementation is modeled on
that (see Section 4.7 of the Dynamo paper).
In Dynamo, they use a Merkle tree for anti-entropy (see Merkle Tree ) . Cassandra does too,
but the implementation is a little different. In Cassandra, each column family has its own
Merkle tree; the tree is created as a snapshot during a major compaction operation, and it is
kept only as long as is required to send it to the neighboring nodes on the ring. The advant-
age of this implementation is that it reduces disk I/O.
See Read Repair for more information on how these repairs occur.
Async W
c Wriitte
Sometimes called “async writes” in documentation and user lists, this simply means “asyn-
chronous writes” and refers to the fact that Cassandra makes heavy use of
Search WWH ::




Custom Search