Database Reference
In-Depth Information
can live with dob in place of date_of_birth as a key name, you'll save 10 bytes per doc-
ument. That may not sound like much, but once you have a billion such documents,
you'll have saved nearly 10 GB of storage space just by using a shorter key name. This
doesn't mean you should go to unreasonable lengths to ensure small key names; be
sensible. But if you expect massive amounts of data, economizing on key names will
save space.
In addition to valid key names, documents must contain values that can be serial-
ized into BSON . A table of BSON types, with examples and notes, can be found at
http://bsonspec.org . Here, I'll only point out some of the highlights and gotchas.
Strings
All string values must be encoded as UTF -8. Though UTF -8 is quickly becoming the
standard for character encoding, there are plenty of situations when an older encod-
ing is still used. Users typically encounter issues with this when importing data gener-
ated by legacy systems into MongoDB. The solution usually involves either converting
to UTF -8 before inserting or, barring that, storing the text as the BSON binary type. 10
Numbers
BSON specifies three numeric types: double , int , and long . This means that BSON can
encode any IEEE floating-point value and any signed integer up to eight bytes in
length. When serializing integers in dynamic languages, the driver will automatically
determine whether to encode as an int or a long . In fact, there's only one common
situation where a number's type must be made explicit, which is when inserting
numeric data via the JavaScript shell. JavaScript, unhappily, natively supports just a sin-
gle numeric type called Number , which is equivalent to an IEEE double. Consequently,
if you want to save a numeric value from the shell as an integer, you need to be
explicit, using either NumberLong() or NumberInt() . Try this example:
db.numbers.save({n: 5});
db.numbers.save({ n: NumberLong(5) });
You've just saved two documents to the numbers collection. And though their values
are equal, the first is saved as a double and the second as a long integer. Querying for
all documents where n is 5 will return both documents:
> db.numbers.find({n: 5});
{ "_id" : ObjectId("4c581c98d5bbeb2365a838f9"), "n" : 5 }
{ "_id" : ObjectId("4c581c9bd5bbeb2365a838fa"), "n" : NumberLong( 5 ) }
But you can see that the second value is marked as a long integer. Another way to see
this is to query by BSON type using the special $type operator. Each BSON type is iden-
tified by an integer, beginning with 1. If you consult the BSON spec at http://
bsonspec.org , you'll see that doubles are type 1 and that 64-bit integers are type 18.
Thus, you can query the collection for values by type:
10
Incidentally, if you're new to character encodings, you owe it to yourself to read Joel Spolsky's well-known
introduction ( http://mng.bz/LVO6 ) . If you're a Rubyist, you may also want to read James Edward Gray's
series on character encodings in Ruby 1.8 and 1.9 ( http://mng.bz/wc4J ) .
Search WWH ::




Custom Search