Database Reference
In-Depth Information
going to disk frequently, greatly slowing read and write operations. In the worst case,
as data size becomes much larger than available RAM , a situation can occur where, for
any read or write, data must be paged to and from disk. This is known as thrashing , and
it causes performance to take a severe dive.
Fortunately, this situation is relatively easy to avoid. At minimum, you need to
make sure that your indexes will fit in RAM . This is one reason why it's important to
avoid creating any unneeded indexes. With extra indexes in place, more RAM will be
required to maintain those indexes. Along the same lines, each index should have
only the keys it needs: a triple-key compound index might be necessary at times, but
be aware that it'll use more space than a simple single-key index.
Ideally, indexes and a working data set fit in RAM . But estimating how much RAM
this requires for any given deployment isn't always easy. You can always discover total
index size by looking at the results of the stats command. But finding out working
set size is less clear-cut because it's different for every application. The working set is the
subset of total data commonly queried and updated. For instance, suppose you have a
million users. If only half of them are active, then your working set for the user collec-
tion is half the total data size. If all users are active, then the working set is equal to the
entire data set.
In chapter 10, we'll revisit the concept of the working set, and we'll look at specific
ways to diagnose hardware-related performance issues. For now, be aware of the
potential costs of adding new indexes, and keep an eye on the ratio of index and
working set size to RAM . Doing so will help you to maintain good performance as your
data grows.
7.1.3
B-trees
As mentioned, MongoDB represents indexes internally as B-trees . B-trees are ubiqui-
tous (see http://mng.bz/wQfG ), having remained in popular use for database
records and indexes since at least the late 1970s. 5 If you've used other database sys-
tems, then you may already be familiar with the various consequences of using B-trees.
This is good because it means you can effectively transfer most of your knowledge of
indexing. If you don't know much about B-trees, that's okay, too; this section will pres-
ent the concepts most relevant to your work with MongoDB.
B-trees have two overarching traits that make them ideal for database indexes.
First, they facilitate a variety of queries, including exact matches, range conditions,
sorting, prefix matching, and index-only queries. Second, they're able to remain bal-
anced in spite of the addition and removal of keys.
We'll look at a simple representation of a B-tree and then discuss some principles
that you'll want to keep in mind. So imagine that you have a collection of users and
that you've created a compound index on last name and age. 6 An abstract representa-
tion of the resulting B-tree might look something like figure 7.5.
5
MongoDB uses B-trees for its indexes only; collections are stored as doubly-linked lists.
6
Indexing on last name and age is a bit contrived, but it nicely illustrates the concepts.
Search WWH ::




Custom Search