Database Reference
In-Depth Information
where manufacturer precedes price, the number of entries scanned would be the
same as the number of entries returned. This is because once you've arrived at the
entry for (Acme - 7500), it's a simple, in-order scan to serve the query.
So the order of keys in a compound index matters. If that seems clear, then the sec-
ond thing you should understand is why we've chosen the first ordering over the sec-
ond. This may be obvious from the diagrams, but there's another way to look at the
problem. Look again at the query: the two query terms specify different kinds of
matches. On manufacturer, you want to match the term exactly. But on price, you
want to match a range of values, beginning with 7500. As a general rule, a query where
one term demands an exact match and another specifies a range requires a com-
pound index where the range key comes second. We'll revisit this idea in the section
on query optimization.
I NDEX EFFICIENCY
Although indexes are essential for good query performance, each new index imposes
a small maintenance cost. It should be easy to see why. Whenever you add a document
to a collection, each index on that collection must be modified to include the new
document. So if a particular collection has 10 indexes, then that makes 10 separate
structures to modify on each insert. This holds for any write operation, whether you're
removing a document or updating a given document's indexed keys.
For read-intensive applications, the cost of indexes is almost always justified. Just
realize that indexes do impose a cost and that they therefore must be chosen with
care. This means ensuring that all of your indexes are used and that none of them are
redundant. You can do this in part by profiling your application's queries, and I'll
describe this process later in the chapter.
But there's a second consideration here. Even with all the right indexes in place,
it's still possible that those indexes won't result in faster queries. This occurs when
indexes and a working data set don't fit in RAM .
You may recall from chapter 1 that MongoDB tells the operating system to map all
data files to memory using the mmap() system call. From this point on, the data files,
which include all documents, collections, and their indexes, are swapped in and out
of RAM by the operating system in 4 KB chunks called pages . 4 Whenever data from a
given page is requested, the operating system must ensure that the page is available in
RAM . If it's not, then a kind of exception known as a page fault is raised, and this tells
the memory manager to load the page from disk into RAM .
With sufficient RAM , all of the data files in use will eventually be loaded into mem-
ory. Whenever that memory is altered, as in the case of a write, those changes will be
flushed to disk asynchronously by the OS , but the write will be fast, occurring directly
in RAM. When data fits into RAM , you have the ideal situation because the number of
disk accesses is reduced to a minimum. But if the working data set can't fit into RAM ,
then page faults will start to creep up. This means that the operating system will be
4
The 4 KB page size is standard but not universal.
Search WWH ::




Custom Search