Introduction to MongoDB - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

BSON is an open standard; you can find its specification at http://bsonspec.org/ . When people hear that BSON is

a binary form of JSON, they expect it to take up much less room than text-based JSON. However, that isn't necessarily the

case; indeed, there are many cases where the BSON version takes up more space than its JSON equivalent.

You might wonder why you should use BSON at all. After all, CouchDB (another powerful document-oriented

database) uses pure JSON, and it's reasonable to wonder whether it's worth the trouble of converting documents back

and forth between BSON and JSON.

First, we must remember that MongoDB is designed to be fast, rather than space-efficient. This doesn't mean that

MongoDB wastes space (it doesn't); however, a small bit of overhead in storing a document is perfectly acceptable

if that makes it faster to process the data (which it does). In short, BSON is much easier to traverse (that is, to look

through) and index very quickly. Although BSON requires slightly more disk space than JSON, this extra space is

unlikely to be a problem, because disks are cheap, and MongoDB can scale across machines. The tradeoff in this case

is quite reasonable: you exchange a bit of extra disk space for better query and indexing performance.

The second key benefit to using BSON is that it is easy and quick to convert BSON to a programming language's

native data format. If the data were stored in pure JSON, a relatively high-level conversion would need to take place.

There are MongoDB drivers for a large number of programming languages (such as Python, Ruby, PHP, C, C++, and

C#), and each works slightly differently. Using a simple binary format, native data structures can be quickly built for

each language, without requiring that you first process JSON. This makes the code simpler and faster, both of which

are in keeping with MongoDB's stated goals.

BSON also provides some extensions to JSON. For example, it enables you to store binary data and to incorporate

a specific datatype. Thus, while BSON can store any JSON document, a valid BSON document may not be valid JSON.

This doesn't matter, because each language has its own driver that converts data to and from BSON without needing

to use JSON as an intermediary language.

At the end of the day, BSON is not likely to be a big factor in how you use MongoDB. Like all great tools, MongoDB

will quietly sit in the background and do what it needs to do. Apart from possibly using a graphical tool to look at your

data, you will generally work in your native language and let the driver worry about persisting to MongoDB.

Supporting Dynamic Queries

MongoDB's support for dynamic queries means that you can run a query without planning for it in advance. This is

similar to being able to run SQL queries against an RDBMS. You might wonder why this is listed as a feature; surely it

is something that every database supports—right?

Actually, no. For example, CouchDB (which is generally considered MongoDB's biggest “competitor”) doesn't

support dynamic queries. This is because CouchDB has come up with a completely new (and admittedly exciting)

way of thinking about data. A traditional RDBMS has static data and dynamic queries. This means that the structure of

the data is fixed in advance—tables must be defined, and each row has to fit into that structure. Because the database

knows in advance how the data is structured, it can make certain assumptions and optimizations that enable fast

dynamic queries.

CouchDB has turned this on its head. As a document-oriented database, CouchDB is schemaless, so the

data is dynamic. However, the new idea here is that queries are static. That is, you define them in advance, before

you can use them.

This isn't as bad as it might sound, because many queries can be easily defined in advance. For example, a

system that lets you search for a topic will probably let you search by ISBN. In CouchDB, you would create an index

that builds a list of all the ISBNs for all the documents. When you punch in an ISBN, the query is very fast because it

doesn't actually need to search for any data. Whenever new data is added to the system, CouchDB will automatically

update its index.

Technically, you can run a query against CouchDB without generating an index; in that case, however, CouchDB

will have to create the index itself before it can process your query. This won't be a problem if you only have a hundred

topics; however, it will result in poor performance if you're filing hundreds of thousands of topics, because each

query will generate the index again (and again). For this reason, the CouchDB team does not recommend dynamic

queries—that is, queries that haven't been predefined—in production.

The Definitive Guide to MongoDB

Search WWH ::

Custom Search

Home