GridFS - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

On the other hand, things become a lot harder if you want to search a file or store complicated or structured data.

Even if you can work out how to do this and create a solution, your solution is unlikely to be faster or more efficient

than relying on a database instead. Today's applications depend on finding and storing data quickly—and databases

make this possible for those of us who can't or don't want to write such a system ourselves.

One area that is glossed over by many topics is the storing of files. Most topics that teach you to use a database to

store your data also teach you to read and write to the filesystem instead when you need to store files. In some ways, this

isn't usually a problem, because it's much easier to read and write simple files than to process what's in them. There are

some issues, however. First, the developer must have permission to write those files in the first place, and that requires

giving the web server permission to write to the local filesystem. This might not seem likely to pose a problem, but it

gives system administrators nightmares—getting files onto a server is the first stage in being able to compromise it.

Databases can store binary files; typically, it's just not elegant for them to do so. MySQL has a special column type

called BLOB . PostgreSQL requires special procedures to be followed to store such files—and the data isn't stored in

the table itself. In other words, it's messy. These solutions are obviously bolt-ons. Thus, it's not surprising that people

choose to write data to the disk instead. But that approach also has issues. Apart from the problems with security, it

adds another directory that needs to be backed up, and you must also ensure that this information is replicated to

all the appropriate servers. There are filesystems that provide the ability to write to disk and have that content fully

replicated (including GFS); but these solutions are complex and add overhead; moreover, these features typically

make your solution harder to maintain.

MongoDB, on the other hand, enforces a maximum document size of 16MB. This is more than enough for storing

rich documents, and it might have sufficed a few years ago for storing many other types of files as well. However, this

limit is wholly inadequate for today's environment.

Working with GridFS

Next, we'll take a brief look at how GridFS is implemented. As the MongoDB website points out, you do not need to

understand or be aware of the underlying implementation of GridFS to use it. In fact, you can simply let the driver

handle the heavy lifting for you. For the most part, the drivers that support GridFS implement file handling in a

language-specific way. For example, the MongoDB driver for Python works in a manner that is wholly consistent with

Python, as you'll see shortly. If the ins-and-outs of GridFS don't interest you, then just skip ahead to the next section.

We promise you won't miss anything that enables you to use MongoDB effectively!

GridFS consists of two parts. More specifically, it consists of two collections. One collection holds the filename

and related information such as size (called metadata), while the other collection holds the file data itself, usually

in 256K chunks. The specification calls for these to be named files and chunks , respectively. By default, the files

and chunks collections are created in the fs namespace, but this can be changed. The ability to change the default

namespace is useful if you want to store different types of files. For example, you might want to keep image and movie

files separate.

Getting Started with the Command-Line Tools

Now that we have some of the background out of the way, let's look at how to get started with GridFS by exploring the

command-line tools available to leverage it. First, we will need a file to play with. To keep things simple, let's use the

dictionary file. On Ubuntu, you can find this at /usr/share/dict/words . However, there are various levels of symbolic

links, so you might want to run this command first:

root@core2:/usr/share/dict# cat words > /tmp/dictionary

■

Note

In Ubuntu, you might need to use apt-get install wbritish to get the dictionary file installed.

The Definitive Guide to MongoDB

Search WWH ::

Custom Search

Home