Indexing and query optimization - MongoDB in Action

Database Reference

In-Depth Information

going to disk frequently, greatly slowing read and write operations. In the worst case,

as data size becomes much larger than available RAM , a situation can occur where, for

any read or write, data must be paged to and from disk. This is known as thrashing , and

it causes performance to take a severe dive.

Fortunately, this situation is relatively easy to avoid. At minimum, you need to

make sure that your indexes will fit in RAM . This is one reason why it's important to

avoid creating any unneeded indexes. With extra indexes in place, more RAM will be

required to maintain those indexes. Along the same lines, each index should have

only the keys it needs: a triple-key compound index might be necessary at times, but

be aware that it'll use more space than a simple single-key index.

Ideally, indexes and a working data set fit in RAM . But estimating how much RAM

this requires for any given deployment isn't always easy. You can always discover total

index size by looking at the results of the stats command. But finding out working

set size is less clear-cut because it's different for every application. The working set is the

subset of total data commonly queried and updated. For instance, suppose you have a

million users. If only half of them are active, then your working set for the user collec-

tion is half the total data size. If all users are active, then the working set is equal to the

entire data set.

In chapter 10, we'll revisit the concept of the working set, and we'll look at specific

ways to diagnose hardware-related performance issues. For now, be aware of the

potential costs of adding new indexes, and keep an eye on the ratio of index and

working set size to RAM . Doing so will help you to maintain good performance as your

data grows.

7.1.3

B-trees

As mentioned, MongoDB represents indexes internally as B-trees . B-trees are ubiqui-

tous (see http://mng.bz/wQfG ), having remained in popular use for database

records and indexes since at least the late 1970s. 5 If you've used other database sys-

tems, then you may already be familiar with the various consequences of using B-trees.

This is good because it means you can effectively transfer most of your knowledge of

indexing. If you don't know much about B-trees, that's okay, too; this section will pres-

ent the concepts most relevant to your work with MongoDB.

B-trees have two overarching traits that make them ideal for database indexes.

First, they facilitate a variety of queries, including exact matches, range conditions,

sorting, prefix matching, and index-only queries. Second, they're able to remain bal-

anced in spite of the addition and removal of keys.

We'll look at a simple representation of a B-tree and then discuss some principles

that you'll want to keep in mind. So imagine that you have a collection of users and

that you've created a compound index on last name and age. 6 An abstract representa-

tion of the resulting B-tree might look something like figure 7.5.

5

MongoDB uses B-trees for its indexes only; collections are stored as doubly-linked lists.

6

Indexing on last name and age is a bit contrived, but it nicely illustrates the concepts.

MongoDB in Action

Search WWH ::

Custom Search

Home