Architecture - Practical Cassandra

Database Reference

In-Depth Information

Gossip Protocol

The way that Cassandra nodes talk to each other is through something called the

Gossip protocol. Using the Gossip protocol, each node has the ability to tell other

nodes how it's doing and find out what other nodes are up to. In other words, the

Gossip protocol exists to ensure that each node knows the state of itself and every

other node in the ring.

The Gossip protocol works by creating a Gossiper endpoint when the system

starts. This happens in the following manner:

1. When Cassandra starts up, it registers itself with the Gossiper to receive

endpoint state information.

2. Periodically, typically once per second, the Gossiper will choose a random

node in the ring and start a Gossip session with it. Each round of Gossip

requires a set of three messages similar to a TCP transaction.

3. The node that is initiating the Gossip sends the receiving node a Gos-

sipDigestSynMessage. This means that it is requesting a synchronization.

4. When the receiving node gets the request, it will respond (assuming it's

not dead) with a GossipDigestAckMessage. This means that the receiving

node acknowledges the message.

5. Then the initiating node receives the ack message and it sends the receiv-

ing node a GossipDigestAck2Message.

Failure Detection

Gossip in the cluster happens frequently because the Gossip protocol is also re-

sponsible for failure detection of nodes. If the receiving node doesn't answer in a

timely manner (or at all), the initiating node will assume that the node is down and

mark it so within the ring information.

Failure of a node or group of nodes within a ring is handled by the Phi Accrual

Failure Detection algorithm. As a result, the associated failure detection threshold

setting is called the *phi_convict_threshold* . The Phi convict threshold

is a setting that adjusts the sensitivity level of failure detection. It is worth noting

that this setting is on an exponential scale. A lower value increases the likelihood

that an unresponsive node will be marked as down. A higher value decreases the

likelihood that a transient failure (such as temporary loss of network connectivity)

will cause a node failure. The default setting is 8. If you are operating your infra-

Search WWH ::

Custom Search

Home