Database Reference
In-Depth Information
{ "_id" : "balancer", "process" : "arete:40000:1299516887:1804289383",
"state" : 1,
"ts" : ObjectId("4d890d30bd9f205b29eda79e"),
"when" : ISODate("2011-03-22T20:57:20.249Z"),
"who" : "arete:40000:1299516887:1804289383:Balancer:846930886",
"why" : "doing balance round"
}
Any state greater than 0 indicates that balancing is happening. The
process
field
shows the host name and port of the computer running the
mongos
that's orchestrat-
ing the balancing round. In this case, the host is
arete:40000
. If balancing ever fails
to stop after you modify the settings collection, you should examine the logs from the
balancing
mongos
for errors.
Once you know that the balancer has stopped, it's safe to run your backups. After
taking your backups, don't forget to restart the balancer. You can do so by resetting
the
stopped
value:
> use config
> db.settings.update({_id: "balancer"}, {$set: {stopped: false}}, true);
To si m p l if y s o m e o f t h e s e o pe ra t io n s w it h t h e b a l a n c e r, M o n g o D B v 2 . 0 h a s i n t r o d u c e d
a couple shell helpers. For example, you can start and stop the balancer with
sh.set-
BalancerState()
:
> sh.setBalancerState(false)
This is equivalent to adjusting the
stopped
value in the
settings
collection. Once
you've disabled the balancer in this way, you make repeated calls to
sh.isBalancer-
Running()
until the balancer stops.
F
AILOVER
AND
RECOVERY
Although we've covered general replica set failures, it's also important to note a
sharded cluster's potential points of failure along with best practices for recovery.
Failure of a shard member
Each shard consists of a replica set. Thus if any member of one of these replica sets
fails, a secondary member will be elected primary, and the
mongos
process will auto-
matically connect to it. Chapter 8 describes the specific steps to take in restoring a
failed replica set member. The method you choose depends on how the member has
failed, but regardless, the instructions are the same whether the replica set is part of a
sharded cluster or not.
If you see anomalous behavior after a replica set failover, you can reset the system
by restarting all
mongos
processes. This will ensure proper connections to the new rep-
lica sets. In addition, if you notice that balancing isn't working, you should check the
config database's
locks
collection for entries whose
process
fields point to former
primary nodes. If you see such an entry, the lock document is stale, and you're safe
manually deleting it.