Database Reference
In-Depth Information
Working with Chunk Sizes
The put command also returns the chunk size because, although there is a default chunk
size, this default can be changed on a file-by-file basis. This allows flexible sizing. If your
website streams video, you might want to have many chunks so that you can easily skip
to any part of a given video with ease. If you had one big file, you would have to return the
whole file, and then find the starting point for the specified section in it. With GridFS, you
can pull back data at the chunk level. If you're using the default size, then you can start
retrieving data from any 256K chunk. Of course, you can also specify the bit of data you
actually want (for example, you might want only five minutes in the middle of a sixty-
minute movie). This is a very efficient system, and 256K is a pretty good chunk size for
most purposes. If you decide to change it, you should have a good reason for doing so. As
always, don't forget to benchmark and test the performance of your custom chunk size;
it's not uncommon for theoretically better systems to fail to live up to expectations.
MongoDB has a 16MB restriction on document size. Because GridFS is simply a
different way of storing files in the standard MongoDB framework, this restriction also exists
in GridFS. That is, you can't create chunks larger than 16MB. This shouldn't pose a problem,
because the whole point of GridFS is to alleviate the need for huge document sizes. If you're
worried that you're storing huge files, and this will give you too many chunk documents, you
needn't worry—there are MongoDB systems in production with significantly more than a
billion documents!
Note
Tracking the Upload Date
The uploadDate key does exactly what its name suggests: it stores the date the file was
created in MongoDB. This is a good time to mention that the files collection is just a
normal MongoDB collection, containing normal documents. This means that you can
add any additional key and value pairs that you need, in the same way you would for any
other collection.
For example, consider the case of a real-world application that needs to store
text content that you extract from various files. You might need to do this so you could
perform some additional indexing and searching. To accomplish this, you might add a
file_text key and store the text in there. The elegance of the GridFS system means that
you can do anything with this system you can do with any other MongoDB documents.
Elegance and power are two of the defining characteristics of working in MongoDB.
Hashing Your Files
MongoDB ships with the MD5 hashing algorithm. You may have come across the
algorithm previously when downloading software over the Internet. The theory behind
MD5 is that each file has a unique signature. Changing a single bit anywhere in that file
 
 
Search WWH ::




Custom Search