Sharding - MongoDB in Action

Database Reference

In-Depth Information

{ "_id" : "balancer", "process" : "arete:40000:1299516887:1804289383",

"state" : 1,

"ts" : ObjectId("4d890d30bd9f205b29eda79e"),

"when" : ISODate("2011-03-22T20:57:20.249Z"),

"who" : "arete:40000:1299516887:1804289383:Balancer:846930886",

"why" : "doing balance round"

}

Any state greater than 0 indicates that balancing is happening. The process field

shows the host name and port of the computer running the mongos that's orchestrat-

ing the balancing round. In this case, the host is arete:40000 . If balancing ever fails

to stop after you modify the settings collection, you should examine the logs from the

balancing mongos for errors.

Once you know that the balancer has stopped, it's safe to run your backups. After

taking your backups, don't forget to restart the balancer. You can do so by resetting

the stopped value:

> use config

> db.settings.update({_id: "balancer"}, {$set: {stopped: false}}, true);

To si m p l if y s o m e o f t h e s e o pe ra t io n s w it h t h e b a l a n c e r, M o n g o D B v 2 . 0 h a s i n t r o d u c e d

a couple shell helpers. For example, you can start and stop the balancer with sh.set-

BalancerState() :

> sh.setBalancerState(false)

This is equivalent to adjusting the stopped value in the settings collection. Once

you've disabled the balancer in this way, you make repeated calls to sh.isBalancer-

Running() until the balancer stops.

F AILOVER AND RECOVERY

Although we've covered general replica set failures, it's also important to note a

sharded cluster's potential points of failure along with best practices for recovery.

Failure of a shard member

Each shard consists of a replica set. Thus if any member of one of these replica sets

fails, a secondary member will be elected primary, and the mongos process will auto-

matically connect to it. Chapter 8 describes the specific steps to take in restoring a

failed replica set member. The method you choose depends on how the member has

failed, but regardless, the instructions are the same whether the replica set is part of a

sharded cluster or not.

If you see anomalous behavior after a replica set failover, you can reset the system

by restarting all mongos processes. This will ensure proper connections to the new rep-

lica sets. In addition, if you notice that balancing isn't working, you should check the

config database's locks collection for entries whose process fields point to former

primary nodes. If you see such an entry, the lock document is stale, and you're safe

manually deleting it.

Search WWH ::

Custom Search

Home