Database Reference
In-Depth Information
ascertain its own health. When you run rs.status() , you see the timestamp of each
node's last heartbeat along with its state of health ( 1 means healthy and 0 means
unresponsive).
As long as every node remains healthy and responsive, the replica set will hum
along its merry way. But if any node becomes unresponsive, action may be taken. What
every replica set wants is to ensure that exactly one primary node exists at all times.
But this is possible only when a majority of nodes is visible. For example, look back at
the replica set you built in the previous section. If you kill the secondary, then a major-
ity of nodes still exists, so the replica set doesn't change state but simply waits for the
secondary to come back online. If you kill the primary, then a majority still exists, but
there's no primary. Therefore, the secondary is automatically promoted to primary. If
more than one secondary happens to exist, then the most current secondary will be
the one elected.
But there are other possible scenarios. Imagine that both the secondary and the
arbiter are killed. Now the primary remains, but there's no majority—only one of
the three original nodes remains healthy. In this case, you'll see a message like this
in the primary's log:
Tue Feb 1 11:26:38 [rs Manager] replSet can't see a majority of the set,
relinquishing primary
Tue Feb 1 11:26:38 [rs Manager] replSet relinquishing primary state
Tue Feb 1 11:26:38 [rs Manager] replSet SECONDARY
With no majority, the primary actually demotes itself to a secondary. This may seem
puzzling, but think about what might happen if this node were allowed to remain pri-
mary. If the heartbeats fail due to some kind of network partition, then the other
nodes will still be online. If the arbiter and secondary are still up and able to see each
other, then according to the rule of the majority, the remaining secondary will
become a primary. If the original primary doesn't step down, then you're suddenly in
an untenable situation: a replica set with two primary nodes. If the application contin-
ues to run, then it might write to and read from two different primaries, a sure recipe
for inconsistency and truly bizarre application behavior. Therefore, when the primary
can't see a majority, it must step down.
C OMMIT AND ROLLBACK
One final important point to understand about replica sets is the concept of a commit .
In essence, you can write to a primary node all day long, but those writes won't be con-
sidered committed until they've been replicated to a majority of nodes. What do I
mean by committed here? The idea can best be explained by example. Imagine again
the replica set you built in the previous section. Suppose you issue a series of writes to
the primary that don't get replicated to the secondary for some reason (connectivity
issues, secondary is down for backup, secondary is lagging, and so forth). Now sup-
pose further that the secondary is suddenly promoted to primary. You write to the new
primary, and eventually the old primary comes back online and tries to replicate from
Search WWH ::




Custom Search