Database Reference
In-Depth Information
Consider a common problem in one of the application types that MongoDB is well suited for: logging.
MongoDB's extraordinary write rate makes streaming events as small documents into a collection very efficient.
However, if you want to optimize further the speed at which you can perform this functionality, you can do a couple of
things.
First, you can consider batching your inserts. MongoDB ships with a multiple-document insert() call. You can
use this call to place several events into a collection at the same time. This results in fewer round-trips through the
database interface API.
Second (and more importantly), you can reduce the size of your field names. If you have smaller field names,
MongoDB can pack more event records into memory before it has to flush them out to disk. This makes the whole
system more efficient.
For example, assume you have a collection that is used to log three fields: a time stamp, a counter, and a
four-character string used to indicate the source of the data. The total storage size of your data is shown in Table 10-2 .
Table 10-2. The Logging Example Collection Storage Size
Field
Size
Timestamp
8 bytes
Integer
4 bytes
string
4 bytes
Total
16 bytes
If you use ts , n , and src for the field names, then the total size of the field names is 6 bytes. This is a relatively
small value compared to the data size. But now assume you decided to name the fields WhenTheEventHappened ,
NumberOfEvents , and SourceOfEvents . In this case, the total size of the field names is 48 bytes, or three times the size
of the data itself. If you wrote 1TB of data into a collection, then you would be storing 750GB of field names, but only
250GB of actual data.
This does more than waste disk space. It also affects all other aspects of the system's performance, including the
index size, data transfer time, and (probably more importantly) the use of precious system RAM to cache the data files.
In logging applications you also need to avoid adding indexes on your collections when writing records; as
explained earlier, indexes take time and resources to maintain. Instead, you should add the index immediately before
you start analyzing the data.
Finally, you should consider using a schema that splits the event stream into multiple collections. For example,
you might write each day's events into a separate collection. Smaller collections take less time to index and analyze.
Summary
In this chapter, we looked at some tools for tracking down slow performance in MongoDB queries, as well as potential
solutions for speeding up the slow queries that surface as a result. We also looked at some of the ways to optimize
data storage. For example, we looked at ways to ensure that we are making full use of the resources available to the
MongoDB server.
The specific techniques described in this chapter enable you to optimize your data and tune the MongoDB
system it is stored in. The best approach to take will vary from application to application, and it will depend on many
factors, including the application type, data access patterns, read/write ratios, and so on.
 
Search WWH ::




Custom Search