GridFS - MongoDB Basics

Database Reference

In-Depth Information

Working with Chunk Sizes

The put command also returns the chunk size because, although there is a default chunk

size, this default can be changed on a file-by-file basis. This allows flexible sizing. If your

website streams video, you might want to have many chunks so that you can easily skip

to any part of a given video with ease. If you had one big file, you would have to return the

whole file, and then find the starting point for the specified section in it. With GridFS, you

can pull back data at the chunk level. If you're using the default size, then you can start

retrieving data from any 256K chunk. Of course, you can also specify the bit of data you

actually want (for example, you might want only five minutes in the middle of a sixty-

minute movie). This is a very efficient system, and 256K is a pretty good chunk size for

most purposes. If you decide to change it, you should have a good reason for doing so. As

always, don't forget to benchmark and test the performance of your custom chunk size;

it's not uncommon for theoretically better systems to fail to live up to expectations.

■ MongoDB has a 16MB restriction on document size. Because GridFS is simply a

different way of storing files in the standard MongoDB framework, this restriction also exists

in GridFS. That is, you can't create chunks larger than 16MB. This shouldn't pose a problem,

because the whole point of GridFS is to alleviate the need for huge document sizes. If you're

worried that you're storing huge files, and this will give you too many chunk documents, you

needn't worry—there are MongoDB systems in production with significantly more than a

billion documents!

Note

Tracking the Upload Date

The uploadDate key does exactly what its name suggests: it stores the date the file was

created in MongoDB. This is a good time to mention that the files collection is just a

normal MongoDB collection, containing normal documents. This means that you can

add any additional key and value pairs that you need, in the same way you would for any

other collection.

For example, consider the case of a real-world application that needs to store

text content that you extract from various files. You might need to do this so you could

perform some additional indexing and searching. To accomplish this, you might add a

file_text key and store the text in there. The elegance of the GridFS system means that

you can do anything with this system you can do with any other MongoDB documents.

Elegance and power are two of the defining characteristics of working in MongoDB.

Hashing Your Files

MongoDB ships with the MD5 hashing algorithm. You may have come across the

algorithm previously when downloading software over the Internet. The theory behind

MD5 is that each file has a unique signature. Changing a single bit anywhere in that file

Search WWH ::

Custom Search

Home