Introduction to MongoDB - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

Profiling Queries

A built-in profiling tool lets you see how MongoDB works out which documents to return. This is useful because,

in many cases, a query can be easily improved simply by adding an index. If you have a complicated query, and

you're not really sure why it's running so slowly, then the query profiler can provide you with extremely valuable

information. Again, you'll learn more about the MongoDB Profiler in Chapter 10.

Updating Information In-Place

When a database updates a row (or in the case of MongoDB, a document), it has a couple of choices about how to do

it. Many databases choose the multi-version concurrency control (MVCC) approach, which allows multiple users to

see different versions of the data. This approach is useful because it ensures that the data won't be changed partway

through by another program during a given transaction.

The downside to this approach is that the database needs to track multiple copies of the data. For example,

CouchDB provides very strong versioning, but this comes at the cost of writing the data out in its entirety. While this

ensures that the data is stored in a robust fashion, it also increases complexity and reduces performance.

MongoDB, on the other hand, updates information in-place . This means that (in contrast to CouchDB) MongoDB

can update the data wherever it happens to be. This typically means that no extra space needs to be allocated, and the

indexes can be left untouched.

Another benefit of this method is that MongoDB performs lazy writes . Writing to and from memory is very fast, but

writing to disk is thousands of times slower. This means that you want to limit reading and writing from the disk as much

as possible. This isn't possible in CouchDB, because that program ensures that each document is quickly written to disk.

While this approach guarantees that the data is written safely to disk, it also impacts performance significantly.

MongoDB only writes to disk when it has to, which is usually once every second or so. This means that if a value

is being updated many times a second—a not uncommon scenario if you're using a value as a page counter or for live

statistics—then the value will only be written once, rather than the thousands of times that CouchDB would require.

This approach makes MongoDB much faster, but, again, it comes with a tradeoff. CouchDB may be slower, but it

does guarantee that data is stored safely on the disk. MongoDB makes no such guarantee, and this is why a traditional

RDBMS is probably a better solution for managing critical data such as billing or accounts receivable.

Storing Binary Data

GridFS is MongoDB's solution to storing binary data in the database. BSON supports saving up to 4MB of binary data

in a document, and this may well be enough for your needs. For example, if you want to store a profile picture or a sound

clip, then 4MB might be more space than you need. On the other hand, if you want to store movie clips, high-quality

audio clips, or even files that are several hundred megabytes in size, then MongoDB has you covered here, too.

GridFS works by storing the information about the file (called metadata ) in the files collection. The data itself is

broken down into pieces called chunks that are stored in the chunks collection. This approach makes storing data both

easy and scalable; it also makes range operations (such as retrieving specific parts of a file) much easier to use.

Generally speaking, you would use GridFS through your programming language's MongoDB driver, so it's

unlikely you'd ever have to get your hands dirty at such a low level. As with everything else in MongoDB, GridFS is

designed for both speed and scalability. This means you can be confident that MongoDB will be up to the task if you

want to work with large data files.

Replicating Data

When we talked about the guiding principles behind MongoDB, we mentioned that RDBMS databases offer certain

guarantees for data storage that are not available in MongoDB. These guarantees weren't implemented for a handful

of reasons. First, these features would slow the database down. Second, they would greatly increase the complexity of

the program. Third, it was felt that the most common failure on a server would be hardware, which would render the

data unusable anyway, even if the data were safely saved to disk.

The Definitive Guide to MongoDB

Search WWH ::

Custom Search

Home