Database Reference
In-Depth Information
and Voldemort, and many of the concepts introduced in this section are
applicable to those other technologies. The internal data model used by
Cassandra is also covered, primarily for historical reasons, because CQL
abstracts much of these internals with its own somewhat incompatible data
model.
Server Architecture
The Cassandra server architecture is a “master-less” cluster of nodes, the
goal being to eliminate single points of failure and allow linear scaling.
To achieve this, Cassandra uses a distributed hash table approach to data
management. In this model, each row of data is assigned a partition key.
This key defines which server stores a particular piece of information. To
improve data durability, Cassandra can also replicate this data to other
nodes through consistent hashing.
Early versions (prior to 1.2) used a server-based consistent hashing
technique that required a fair bit of maintenance to calculate and assign
tokens to each node to tell it which parts of the hash range each node would
store. Newer versions of Cassandra introduced the concept of the virtual
node, called a “vnode,” which breaks the hash range into much smaller
pieces that are then randomly distributed among the nodes.
Each of these virtual nodes maintains its own indexes for the data contained
within that partition as well as a specialized data structure called a Bloom
Filter (discussed in detail in Chapter 10, “Approximating Streaming Data
with Sketching”) that helps to quickly determine if a query needs any data
from a particular virtual node.
A node in a cluster maintains a map of other servers through a peer-to-peer
communication protocol. Called the “gossip” protocol, each server
exchanges state information with up to three other servers in the cluster
roughly once every second. This information also contains state information
for other servers, so a given node will quickly learn the entire cluster
topology as its gossips with its peers.
In addition to the gossip protocol, the “snitch” protocol determines the
local network topology. This helps Cassandra optimize its request routing
and discover nodes that are in an active, but degraded, state. In practice,
“sick” nodes arise for a variety of reasons and are fairly common as a result.
There are different kinds of snitches optimized to various deployments. For
Search WWH ::




Custom Search