Database Reference
In-Depth Information
standard CTRL - C or kill -2 will do the trick. You can also connect to the primary
using the shell and run db.shutdownServer() .
Once you've killed the primary, note that the secondary detects the lapse in the
primary's heartbeat. The secondary then elects itself primary. This election is possible
because a majority of the original nodes (the arbiter and the original secondary) are
still able to ping each other. Here's an excerpt from the secondary node's log:
[ReplSetHealthPollTask] replSet info arete:40000 is down (or slow to respond)
Mon Jan 31 22:56:22 [rs Manager] replSet info electSelf 1
Mon Jan 31 22:56:22 [rs Manager] replSet PRIMARY
If you connect to the new primary node and check the replica set status, you'll see that
the old primary is unreachable:
> rs.status()
{
"_id" : 0,
"name" : "arete:40000",
"health" : 1,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"t" : 1296510078000,
"i" : 1
},
"optimeDate" : ISODate("2011-01-31T21:43:18Z"),
"lastHeartbeat" : ISODate("2011-02-01T03:29:30Z"),
"errmsg": "socket exception"
}
Post-failover, the replica set consists of just two nodes. Because the arbiter has no data,
your application will continue to function as long as it communicates with the primary
node only. 3 Even so, replication isn't happening, and there's now no possibility of
failover. The old primary must be restored. Assuming that the old primary was shut
down cleanly, you can bring it back online, and it'll automatically rejoin the replica set
as a secondary. Go ahead and try that now by restarting the old primary node.
That's a clean overview of replica sets. Some of the details are, unsurprisingly,
messier. In the next two sections, you'll see how replica sets actually work, and look at
deployment, advanced configuration, and how to handle tricky scenarios that may
arise in production.
8.2.2
How replication works
Replica sets rely on two basic mechanisms: an oplog and a heartbeat . The oplog enables
the replication of data, and the heartbeat monitors health and triggers failover. You'll
3
Applications sometimes query secondary nodes for read scaling. If that's happening, then this kind of failure
will cause read failures. Thus it's important to design your application with failover in mind. More on this at
the end of the chapter.
Search WWH ::




Custom Search