Advanced Topics - MongoDB: The Definitive Guide

Databases Reference

In-Depth Information

The API for working with GridFS from PyMongo is very similar to that of mongofiles :

we can easily perform the basic put , get , and list operations. Almost all of the

MongoDB drivers follow this basic pattern for working with GridFS, while often ex-

posing more advanced functionality as well. For driver-specific information on GridFS,

please check out the documentation for the specific driver you're using.

Under the Hood

GridFS is a lightweight specification for storing files that is built on top of normal

MongoDB documents. The MongoDB server actually does almost nothing to “special-

case” the handling of GridFS requests; all of the work is handled by the client-side

drivers and tools.

The basic idea behind GridFS is that we can store large files by splitting them up into

chunks and storing each chunk as a separate document. Because MongoDB supports

storing binary data in documents, we can keep storage overhead for chunks to a min-

imum. In addition to storing each chunk of a file, we store a single document that groups

the chunks together and contains metadata about the file.

The chunks for GridFS are stored in their own collection. By default chunks will use

the collection fs.chunks , but this can be overridden if needed. Within the chunks col-

lection the structure of the individual documents is pretty simple:

{

"_id" : ObjectId("..."),

"n" : 0,

"data" : BinData("..."),

"files_id" : ObjectId("...")

}

Like any other MongoDB document, the chunk has its own unique "_id" . In addition,

it has a couple of other keys. "files_id" is the "_id" of the file document that contains

the metadata for this chunk. "n" is the chunk number; this attribute tracks the order

that chunks were present in the original file. Finally, "data" contains the binary data

that makes up this chunk of the file.

The metadata for each file is stored in a separate collection, which defaults to fs.files .

Each document in the files collection represents a single file in GridFS and can contain

any custom metadata that should be associated with that file. In addition to any user-

defined keys, there are a couple of keys that are mandated by the GridFS specification:

_id

A unique id for the file—this is what will be stored in each chunk as the value for

the "files_id" key.

length

The total number of bytes making up the content of the file.

Search WWH ::

Custom Search

Home