Databases Reference
In-Depth Information
Use SSDs
SSDs (solid state drives) are much faster than spinning hard disks for many things,
but they are often smaller, more expensive, are difficult to securely erase, and still
do not come close to the speed at which you can read from memory. This isn't to
discourage you from using them: they usually work fantastically with MongoDB,
but they aren't a magical cure-all.
Add more RAM
Adding more RAM means you have to hit disk less. However, adding RAM will
only get you so far—at some point, your data isn't going to fit in RAM anymore.
So, the question becomes: how do we store terabytes (petabytes?) of data on disk, but
program an application that will mostly access data already in memory and move data
from disk to memory as infrequently as possible?
If you literally access all of your data randomly in real time, you're just going to need
a lot of RAM. However, most applications don't: recent data is accessed more than
older data, certain users are more active than others, certain regions have more cus-
tomers than others. Applications like these can be designed to keep certain documents
in memory and go to disk very infrequently.
Tip #22: Use indexes to do more with less memory
First, just so we're all on the same page, Figure 3-1 shows the sequence a read request
takes.
We'll assume, for this topic, that a page of memory is 4KB, although this is not uni-
versally true.
So, let's say you have a machine with 256GB of data and 16GB of memory. Let's say
most of this data is in one collection and you query this collection. What does Mon-
goDB do?
MongoDB loads the first page of documents from disk into memory, and compares
those to your query. Then it loads the next page and compares those. Then it loads the
next page. And so on, through 256GB of data. It can't take any shortcuts: it cannot
know if a document matches without looking at the document, so it must look at every
document. Thus, it will need to load all 256GB into memory (the OS takes care of
swapping the oldest pages out of memory as it needs room for new ones). This is going
to take a long, long time.
How can we avoid loading all 256GB into memory every time we do a query? We can
tell MongoDB to create an index on a given field, x , and MongoDB will create a tree of
the collection's values for that field. MongoDB basically preprocesses the data, adding
every x value in the collection to an ordered tree (see Figure 3-2 ). Each index entry in
the tree contains a value of x and a pointer to the document with that x value. The tree
 
Search WWH ::




Custom Search