Information Technology Reference
In-Depth Information
1.5.3 Partition Tolerance
Partition tolerance means the system continues to operate despite arbitrary message loss or
failure of part of the system. The simplest example of partition tolerance is when the sys-
tem continues to operate even if the machines involved in providing the service lose the
ability to communicate with each other due to a network link going down (see Figure 1.8 ) .
Figure 1.8: Nodes partitioned from each other
Returning to our example of replicas, if the system is read-only it is easy to make the
system partition tolerant, as the replicas do not need to communicate with each other. But
consider the example of replicas containing state that is updated on one replica first, then
copied to other replicas. If the replicas are unable to communicate with each other, the sys-
tem fails to be able to guarantee updates will propagate within a certain amount of time,
thus becoming a failed system.
Now consider a situation where two servers cooperate in a master-slave relationship.
Both maintain a complete copy of the state and the slave takes over the master's role if the
master fails, which is determined by a loss of heartbeat—that is, a periodic health check
between two servers often done via a dedicated network. If the heartbeat network between
the two is partitioned, the slave will promote itself to being the master, not knowing that
theoriginal master isupbutunabletocommunicate ontheheartbeat network.Atthispoint
there are two masters and the system breaks. This situation is called split brain.
Search WWH ::




Custom Search