Introduction to MongoDB - MongoDB Basics

Database Reference

In-Depth Information

cluster ). And although Oracle can do this with its impressive Real Application Clusters

(RAC) architecture, you can expect to take out a mortgage if you want to use that

solution—implementing a RAC-based solution requires multiple servers, shared storage,

and several software licenses.

You might wonder why having an active/active cluster on two databases is so

difficult. When you query your database, the database has to find all the relevant data

and link it all together. RDBMS solutions feature many ingenious ways to improve

performance, but they all rely on having a complete picture of the data available. And

this is where you hit a wall: this approach simply doesn't work when half the data is on

another server.

Of course, you might have a small database that simply gets lots of requests, so you

just need to share the workload. Unfortunately, here you hit another wall. You need

to ensure that data written to the first server is available to the second server. And you

face additional issues if updates are made on two separate masters simultaneously.

For example, you need to determine which update is the correct one. Another problem

you can encounter: someone might query the second server for information that has

just been written to the first server, but that information hasn't been updated yet on the

second server. When you consider all these issues, it becomes easy to see why the Oracle

solution is so expensive—these problems are extremely hard to address.

MongoDB solves the active/active cluster problems in a very clever way—it avoids

them completely. Recall that MongoDB stores data in BSON documents, so the data

is self-contained. That is, although similar documents are stored together, individual

documents aren't made up of relationships. This means that everything you need is all in

one place. Because queries in MongoDB look for specific keys and values in a document,

this information can be easily spread across as many servers as you have available. Each

server checks the content it has and returns the result. This effectively allows almost

linear scalability and performance. As an added bonus, it doesn't even require that you

take out a new mortgage to pay for this functionality.

Admittedly, MongoDB does not offer master/master replication , in which two

separate servers can both accept write requests. However, it does have sharding , which

allows data to split across multiple machines, with each machine responsible for

updating different parts of the dataset. The benefit of this design is that, while some

solutions allow two master databases, MongoDB can potentially scale to hundreds of

machines as easily as it can run on two.

Opting for Performance vs. Features

Performance is important, but MongoDB also provides a large feature set. We've

already discussed some of the features MongoDB doesn't implement, and you might

be somewhat skeptical of the claim that MongoDB achieves its impressive performance

partly by judiciously excising certain features common to other databases. However,

there are analogous database systems available that are extremely fast, but also extremely

limited, such as those that implement a key/value store.

A perfect example is memcached . This application was written to provide high-speed

data caching, and it is mind-numbingly fast. When used to cache website content, it can

speed up an application many times over. This application is used by extremely large

websites, such as Facebook and LiveJournal.

MongoDB Basics

Search WWH ::

Custom Search

Home