Database Reference
In-Depth Information
BSON is an open standard; you can find its specification at http://bsonspec.org/ . When people hear that BSON is
a binary form of JSON, they expect it to take up much less room than text-based JSON. However, that isn't necessarily the
case; indeed, there are many cases where the BSON version takes up more space than its JSON equivalent.
You might wonder why you should use BSON at all. After all, CouchDB (another powerful document-oriented
database) uses pure JSON, and it's reasonable to wonder whether it's worth the trouble of converting documents back
and forth between BSON and JSON.
First, we must remember that MongoDB is designed to be fast, rather than space-efficient. This doesn't mean that
MongoDB wastes space (it doesn't); however, a small bit of overhead in storing a document is perfectly acceptable
if that makes it faster to process the data (which it does). In short, BSON is much easier to traverse (that is, to look
through) and index very quickly. Although BSON requires slightly more disk space than JSON, this extra space is
unlikely to be a problem, because disks are cheap, and MongoDB can scale across machines. The tradeoff in this case
is quite reasonable: you exchange a bit of extra disk space for better query and indexing performance.
The second key benefit to using BSON is that it is easy and quick to convert BSON to a programming language's
native data format. If the data were stored in pure JSON, a relatively high-level conversion would need to take place.
There are MongoDB drivers for a large number of programming languages (such as Python, Ruby, PHP, C, C++, and
C#), and each works slightly differently. Using a simple binary format, native data structures can be quickly built for
each language, without requiring that you first process JSON. This makes the code simpler and faster, both of which
are in keeping with MongoDB's stated goals.
BSON also provides some extensions to JSON. For example, it enables you to store binary data and to incorporate
a specific datatype. Thus, while BSON can store any JSON document, a valid BSON document may not be valid JSON.
This doesn't matter, because each language has its own driver that converts data to and from BSON without needing
to use JSON as an intermediary language.
At the end of the day, BSON is not likely to be a big factor in how you use MongoDB. Like all great tools, MongoDB
will quietly sit in the background and do what it needs to do. Apart from possibly using a graphical tool to look at your
data, you will generally work in your native language and let the driver worry about persisting to MongoDB.
Supporting Dynamic Queries
MongoDB's support for dynamic queries means that you can run a query without planning for it in advance. This is
similar to being able to run SQL queries against an RDBMS. You might wonder why this is listed as a feature; surely it
is something that every database supports—right?
Actually, no. For example, CouchDB (which is generally considered MongoDB's biggest “competitor”) doesn't
support dynamic queries. This is because CouchDB has come up with a completely new (and admittedly exciting)
way of thinking about data. A traditional RDBMS has static data and dynamic queries. This means that the structure of
the data is fixed in advance—tables must be defined, and each row has to fit into that structure. Because the database
knows in advance how the data is structured, it can make certain assumptions and optimizations that enable fast
dynamic queries.
CouchDB has turned this on its head. As a document-oriented database, CouchDB is schemaless, so the
data is dynamic. However, the new idea here is that queries are static. That is, you define them in advance, before
you can use them.
This isn't as bad as it might sound, because many queries can be easily defined in advance. For example, a
system that lets you search for a topic will probably let you search by ISBN. In CouchDB, you would create an index
that builds a list of all the ISBNs for all the documents. When you punch in an ISBN, the query is very fast because it
doesn't actually need to search for any data. Whenever new data is added to the system, CouchDB will automatically
update its index.
Technically, you can run a query against CouchDB without generating an index; in that case, however, CouchDB
will have to create the index itself before it can process your query. This won't be a problem if you only have a hundred
topics; however, it will result in poor performance if you're filing hundreds of thousands of topics, because each
query will generate the index again (and again). For this reason, the CouchDB team does not recommend dynamic
queries—that is, queries that haven't been predefined—in production.
 
Search WWH ::




Custom Search