Optimization - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

Consider a common problem in one of the application types that MongoDB is well suited for: logging.

MongoDB's extraordinary write rate makes streaming events as small documents into a collection very efficient.

However, if you want to optimize further the speed at which you can perform this functionality, you can do a couple of

things.

First, you can consider batching your inserts. MongoDB ships with a multiple-document insert() call. You can

use this call to place several events into a collection at the same time. This results in fewer round-trips through the

database interface API.

Second (and more importantly), you can reduce the size of your field names. If you have smaller field names,

MongoDB can pack more event records into memory before it has to flush them out to disk. This makes the whole

system more efficient.

For example, assume you have a collection that is used to log three fields: a time stamp, a counter, and a

four-character string used to indicate the source of the data. The total storage size of your data is shown in Table 10-2 .

Table 10-2. The Logging Example Collection Storage Size

Field

Size

Timestamp

8 bytes

Integer

4 bytes

string

4 bytes

Total

16 bytes

If you use ts , n , and src for the field names, then the total size of the field names is 6 bytes. This is a relatively

small value compared to the data size. But now assume you decided to name the fields WhenTheEventHappened ,

NumberOfEvents , and SourceOfEvents . In this case, the total size of the field names is 48 bytes, or three times the size

of the data itself. If you wrote 1TB of data into a collection, then you would be storing 750GB of field names, but only

250GB of actual data.

This does more than waste disk space. It also affects all other aspects of the system's performance, including the

index size, data transfer time, and (probably more importantly) the use of precious system RAM to cache the data files.

In logging applications you also need to avoid adding indexes on your collections when writing records; as

explained earlier, indexes take time and resources to maintain. Instead, you should add the index immediately before

you start analyzing the data.

Finally, you should consider using a schema that splits the event stream into multiple collections. For example,

you might write each day's events into a separate collection. Smaller collections take less time to index and analyze.

Summary

In this chapter, we looked at some tools for tracking down slow performance in MongoDB queries, as well as potential

solutions for speeding up the slow queries that surface as a result. We also looked at some of the ways to optimize

data storage. For example, we looked at ways to ensure that we are making full use of the resources available to the

MongoDB server.

The specific techniques described in this chapter enable you to optimize your data and tune the MongoDB

system it is stored in. The best approach to take will vary from application to application, and it will depend on many

factors, including the application type, data access patterns, read/write ratios, and so on.

The Definitive Guide to MongoDB

Search WWH ::

Custom Search

Home