Binary data and GridFS - MongoDB in Action

Database Reference

In-Depth Information

> use images

> db.thumbnails.findOne({}, {data: 0})

{

"_id" : ObjectId("4d608614238d3b4ade000001"),

"md5" : BinData(5,"K1ud3EUjT49wdMdkOGjbDg=="),

"name" : "monument-thumb.jpg"

}

See that the MD5 field is clearly marked as binary data, with the subtype and raw payload.

C.2

GridFS

GridFS is a convention for storing files of arbitrary size in MongoDB. The GridFS spec-

ification is implemented by all of the official drivers and by MongoDB's mongofiles

tool, ensuring consistent access across platforms. GridFS is useful for storing large

binary objects in the database. It's frequently fast enough to serve these object as well,

and the storage method is conducive to streaming.

The term GridFS frequently leads to confusion, so two clarifications are worth mak-

ing right off the bat. The first is that GridFS isn't an intrinsic feature of MongoDB. As

mentioned, it's a convention that all the official drivers (and some tools) use to manage

large binary objects in the database. Second, it's important to clarify that GridFS

doesn't have the rich semantics of bona fide file systems. For instance, there's no pro-

tocol for locking and concurrency, and this limits the GridFS interface to simple put,

get, and delete operations. This means that if you want to update a file, you need to

delete it and then put the new version.

GridFS works by dividing a large file into small, 256 KB chunks and then storing

each chunk as a separate document. By default, these chunks are stored in a collec-

tion called fs.chunks . Once the chunks are written, the file's metadata is stored in a

single document in another collection called fs.files . Figure C.1 contains a simplis-

tic illustration of this process applied to a theoretical 1 MB file called canyon.jpg .

That should be enough theory to use GridFS. Next we'll see GridFS in practice

through the Ruby GridFS API and the mongofiles utility.

C.2.1

GridFS in Ruby

Earlier you stored a small image thumbnail. The thumbnail took up only 10 KB and

was thus ideal for keeping in a single document. The original image is almost 2 MB

in size, and is therefore much more appropriate for GridFS storage. Here you'll

store the original using Ruby's GridFS API . First, you connect to the database and

then initialize a Grid object, which takes a reference to the database where the

GridFS file will be stored.

Next, you open the original image file, canyon.jpg , for reading. The most basic

GridFS interface uses methods to put and get a file. Here you use the Grid#put

method, which takes either a string of binary data or an IO object, such as a file

pointer. You pass in the file pointer and the data is written to the database.

Search WWH ::

Custom Search

Home