Database Reference
In-Depth Information
Drilling Down on the Oplog
In simple terms, the oplog (operation log) is a capped collection with a rolling record of the changes that a primary
instance makes to its databases for the purpose of replaying those changes to a secondary to ensure that the databases
are identical. Each member of a replica set maintains its own oplog, and the secondaries query the primary's (or other
more up-to-date secondary's) oplog for new entries to apply to their own copies of all databases.
The oplog creates a timestamp for each entry. This enables a secondary to track how far it has read from the
oplog during a previous read, and what entries it needs to transfer to catch up. If you stop a secondary and restart it a
relatively short later time, it will use the primary's oplog to retrieve all the changes it has missed while offline.
Because it is not practical to have an infinitely large oplog, the oplog is limited or capped at a particular size.
You can think of the oplog as a window on the recent activity of your primary instance; if that window is too
small, then operations will be lost from the oplog before they can be applied to the secondaries. If an oplog has not yet
been created on the current instance, the -- oplogSize startup option allows you to set the size of your oplog in MB By
default for a Linux or Windows 64-bit system, the oplogSize will be set to five percent of the free disk space available
for data storage. If your system is write/update intensive, then you may need to increase this size to ensure that slaves
can be offline for a reasonable amount of time without losing data.
For example, if you have a daily backup from the slave that takes an hour to complete, the size of the oplog will
have to be set to allow the slave to stay offline for that hour plus an additional amount of time to provide a
safety margin.
It's critical that you take into account the update rate on all the databases present on the master when calculating
a suitable size for the oplog.
You can get some idea about a suitable size for your oplog by using the db. printReplicationInfo() command,
which runs on the master instance:
$mongo
>db.printReplicationInfo()
configured oplog size: 15000MB
log length start to end: 6456672secs (1793.52hrs)
oplog first event time: Wed Mar 20 2013 17:00:43 GMT+1100 (EST)
oplog last event time: Mon Jun 03 2013 09:31:55 GMT+1000 (EST)
now: Mon Jun 03 2013 20:22:20 GMT+1000 (EST)
This command shows the current size of your oplog, as well as the amount of time it will take to fill up at the
current update rate. From this information, you can estimate whether you need to increase or decrease the size of
your oplog. You can also look at how far behind a given member of your replica set is from the primary by reviewing
the repl lag section in MongoDB Monitoring Service (MMS). If you have not installed MMS already, I truly suggest
you do now, as the larger and more scaled your MongoDB cluster becomes, the more important stats like those MMS
provides become. For more background, you should review the MMS section of Chapter 9.
Implementing a Replica Set
In this section, you will learn how to set up a simple replica set configuration. You will also learn how to add and
remove members from the cluster. As discussed earlier, replica sets are based on the concept of a single primary
server and a number of secondary or arbiter servers that will replicate writes from the primary (see Figure 11-3 ).
 
Search WWH ::




Custom Search