Database Reference
In-Depth Information
When looking at the database stats, it's also worth noting the difference between
dataSize
and
storageSize
. If
storageSize
exceeds
dataSize
by more than a factor
of two, then performance may suffer because of on-disk fragmentation. This fragmen-
tation can force the machine to use much more
RAM
than is required; so in this case,
you may want to try compacting your data files first before adding more
RAM
. See the
section on compaction earlier in the chapter for instructions on how to do this.
10.4.3
Increase disk performance
There are a couple of issues with adding
RAM
. The first is that it isn't always possible;
for example, if you're running on
EC2
, then the largest available virtual machine limits
you to 68
GB
RAM
. The second issue is that adding
RAM
doesn't always solve the
I
/
O
problem. For instance, if your application is write intensive, then the background
flushes or the paging of new data into
RAM
may overwhelm your disks anyway. Thus if
you have efficient indexes and sufficient
RAM
and still see disk
I
/
O
slowness, then you
may want to look into improving disk performance.
There are two ways to increase disk performance. One is to purchase faster disks. A
15 K
RPM
drive or an
SSD
might be worth the investment. Alternatively, or in addition,
you can configure your disks in a
RAID
array, as this can increase both read and write
throughput.
12
A
RAID
array may resolve
I
/
O
bottlenecks if configured properly. As men-
tioned, running a
RAID
10 on
EBS
volumes increases read throughput significantly.
10.4.4
Scale horizontally
Horizontal scaling is the next obvious step to take in addressing a performance prob-
lem. Here there are two routes you can take. If your application is read intensive, it
may be that a single node can't serve all the queries demanded of it, even with opti-
mized indexes and data in
RAM
. This may call for distribution of reads across replicas.
The official MongoDB drivers provide support for scaling across the members of a
replica set, and this strategy is worth a try before escalating to a sharded cluster.
When all else fails, there's sharding. You should move to a sharded cluster when
any of the following apply:
You can't fit your working set entirely into the physical
RAM
of any one
machine.
The write load is too intensive for any one machine.
If you've set up a sharded cluster and still experience performance issues, then you
should first go back and make sure that all your indexes are optimized, that data is fit-
ting into
RAM
, and that your disks are performing effectively. To get the best hardware
utilization, you may need to add more shards.
12
The other nice thing about RAID is that with the right RAID level, you get disk redundancy.