Databases Reference
In-Depth Information
The API for working with GridFS from PyMongo is very similar to that of mongofiles :
we can easily perform the basic put , get , and list operations. Almost all of the
MongoDB drivers follow this basic pattern for working with GridFS, while often ex-
posing more advanced functionality as well. For driver-specific information on GridFS,
please check out the documentation for the specific driver you're using.
Under the Hood
GridFS is a lightweight specification for storing files that is built on top of normal
MongoDB documents. The MongoDB server actually does almost nothing to “special-
case” the handling of GridFS requests; all of the work is handled by the client-side
drivers and tools.
The basic idea behind GridFS is that we can store large files by splitting them up into
chunks and storing each chunk as a separate document. Because MongoDB supports
storing binary data in documents, we can keep storage overhead for chunks to a min-
imum. In addition to storing each chunk of a file, we store a single document that groups
the chunks together and contains metadata about the file.
The chunks for GridFS are stored in their own collection. By default chunks will use
the collection fs.chunks , but this can be overridden if needed. Within the chunks col-
lection the structure of the individual documents is pretty simple:
{
"_id" : ObjectId("..."),
"n" : 0,
"data" : BinData("..."),
"files_id" : ObjectId("...")
}
Like any other MongoDB document, the chunk has its own unique "_id" . In addition,
it has a couple of other keys. "files_id" is the "_id" of the file document that contains
the metadata for this chunk. "n" is the chunk number; this attribute tracks the order
that chunks were present in the original file. Finally, "data" contains the binary data
that makes up this chunk of the file.
The metadata for each file is stored in a separate collection, which defaults to fs.files .
Each document in the files collection represents a single file in GridFS and can contain
any custom metadata that should be associated with that file. In addition to any user-
defined keys, there are a couple of keys that are mandated by the GridFS specification:
_id
A unique id for the file—this is what will be stored in each chunk as the value for
the "files_id" key.
length
The total number of bytes making up the content of the file.
 
Search WWH ::




Custom Search