Database Reference
In-Depth Information
like a master in a master-slave mechanism. It is just another node that helps newly joining
nodes to bootstrap gossip protocol. So, seeds are not a single point of failure ( SPOF ) and
neither has any other purpose that makes them superior.
Failure detection
Failure detection is one of the fundamental features of any robust and distributed system.
A good failure detection mechanism implementation makes a fault-tolerant system, such
as Cassandra. The failure detector that Cassandra uses is a variation of The ϕ accrual fail-
ure detector (2004) by Xavier Défago and others ( http://citeseerx.ist.psu.edu/viewdoc/
summary?doi=10.1.1.106.3350 ) .
The idea behind a failure detector is to detect a communication failure and take appropri-
ate actions based on the state of the remote node. Unlike traditional failure detectors, phi
accrual failure detector does not emit a Boolean alive or dead (true or false, trust or sus-
pect) value. Instead, it gives a continuous value to the application and the application is
left to decide the level of severity and act accordingly. This continuous suspect value is
called phi ( ϕ ). So, how does ϕ get calculated?
Let's say we are observing the heartbeat sent from a process on a remote machine. As-
sume that the latest heartbeat arrived at time T last , current time t now , and P later (t) is the
probability that the heartbeat will arrive t time unit later than the last heartbeat. Then ϕ
can be calculated as follows:
ϕ(t now ) = -log 10 (P later (t now - T last ))
Let's observe this formula informally using common sense. On a sunny day, when
everything is fine and the heartbeat is at a constant interval ∆t, the probability of the next
heartbeat will keep increasing towards (t now - T last ) as one approaches ∆t. So, the value of
ϕ will go up. If a heartbeat is not received at ∆t, the more we depart away, the lower the
value of P later becomes, and the value of ϕ keeps on increasing, as shown in the following
figure:
Search WWH ::




Custom Search