Database Reference
In-Depth Information
Tracking the Upload Date
The uploadDate key does exactly what its name suggests: it stores the date the file was created in MongoDB. This is a
good time to mention that the files collection is just a normal MongoDB collection, containing normal documents.
This means that you can add any additional key and value pairs that you need, in the same way you would for any
other collection.
For example, consider the case of a real-world application that needs to store text content that you extract from
various files. You might need to do this so you could perform some additional indexing and searching. To accomplish
this, you might add a file_text key and store the text in there. The elegance of the GridFS system means that you
can do anything with this system you can do with any other MongoDB documents. Elegance and power are two of the
defining characteristics of working in MongoDB.
Hashing Your Files
MongoDB ships with the MD5 hashing algorithm. You may have come across the algorithm previously when
downloading software over the Internet. The theory behind MD5 is that each file has a unique signature. Changing
a single bit anywhere in that file will drastically (and noticeably) change the signature. This signature is used for
two reasons: security and integrity. For security, if you know what the MD5 hash is supposed to be and you trust the
source (perhaps a friend gave it to you), then you can be assured that the file has not been altered if the hash (often
called the checksum ) is correct. This also ensures that the file integrity has been maintained and that no data has
been lost or damaged. The MD5 hash of a particular file acts like a fingerprint for a file. The hash can be also used to
identify files that have different filenames but have the same contents.
â–  The MD5 algorithm is no longer considered secure, and it has been demonstrated that it is possible to
create two different files that have the same MD5 checksum, even though their contents are different. In cryptographic
terms, this is called a collision . Such collisions are bad because they mean it is possible for an attacker to alter a file in
such a way that it cannot be detected. This caveat remains somewhat theoretical because a great deal of effort and time
would be required to create such collisions intentionally; and even then, the files could be so different as to be obviously
not the same file. For this reason, MD5 is still the preferred method of determining file integrity because it is so widely
supported. however, if you want to use hashing for its security benefits, you are much better off using one of the Sha
family specifications—ideally Sha-256 or Sha-512. even these hashing families have some theoretical vulnerabilities;
however, no one has yet demonstrated a practical case of creating intentional collisions for the Sha family of hashes.
MongoDB uses MD5 to ensure file integrity, which is fine for most purposes. however, if you want to hash important data
(such as user passwords), you should probably consider using the Sha family of hashes instead.
Warning
Looking Under MongoDB's Hood
At this point, you have some data in a MongoDB database. Now let's take a closer look at that data under the covers. To
do this, you'll again use some command-line tools to connect to the database and query it. For example, try running
the find() command against the file created earlier:
$ mongo test
MongoDB shell version: 2.5.1-pre
connecting to: test
 
 
Search WWH ::




Custom Search