Indexing and query optimization - MongoDB in Action

Database Reference

In-Depth Information

where manufacturer precedes price, the number of entries scanned would be the

same as the number of entries returned. This is because once you've arrived at the

entry for (Acme - 7500), it's a simple, in-order scan to serve the query.

So the order of keys in a compound index matters. If that seems clear, then the sec-

ond thing you should understand is why we've chosen the first ordering over the sec-

ond. This may be obvious from the diagrams, but there's another way to look at the

problem. Look again at the query: the two query terms specify different kinds of

matches. On manufacturer, you want to match the term exactly. But on price, you

want to match a range of values, beginning with 7500. As a general rule, a query where

one term demands an exact match and another specifies a range requires a com-

pound index where the range key comes second. We'll revisit this idea in the section

on query optimization.

I NDEX EFFICIENCY

Although indexes are essential for good query performance, each new index imposes

a small maintenance cost. It should be easy to see why. Whenever you add a document

to a collection, each index on that collection must be modified to include the new

document. So if a particular collection has 10 indexes, then that makes 10 separate

structures to modify on each insert. This holds for any write operation, whether you're

removing a document or updating a given document's indexed keys.

For read-intensive applications, the cost of indexes is almost always justified. Just

realize that indexes do impose a cost and that they therefore must be chosen with

care. This means ensuring that all of your indexes are used and that none of them are

redundant. You can do this in part by profiling your application's queries, and I'll

describe this process later in the chapter.

But there's a second consideration here. Even with all the right indexes in place,

it's still possible that those indexes won't result in faster queries. This occurs when

indexes and a working data set don't fit in RAM .

You may recall from chapter 1 that MongoDB tells the operating system to map all

data files to memory using the mmap() system call. From this point on, the data files,

which include all documents, collections, and their indexes, are swapped in and out

of RAM by the operating system in 4 KB chunks called pages . 4 Whenever data from a

given page is requested, the operating system must ensure that the page is available in

RAM . If it's not, then a kind of exception known as a page fault is raised, and this tells

the memory manager to load the page from disk into RAM .

With sufficient RAM , all of the data files in use will eventually be loaded into mem-

ory. Whenever that memory is altered, as in the case of a write, those changes will be

flushed to disk asynchronously by the OS , but the write will be fast, occurring directly

in RAM. When data fits into RAM , you have the ideal situation because the number of

disk accesses is reduced to a minimum. But if the working data set can't fit into RAM ,

then page faults will start to creep up. This means that the operating system will be

4

The 4 KB page size is standard but not universal.

Search WWH ::

Custom Search

Home