Introduction to MongoDB - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

This version of the example improves on things a bit more. Now you can clearly see what each number is for.

JSON is extremely expressive, and, although it's quite easy to write JSON by hand, it is usually generated automatically

in software. For example, Python includes a module called (somewhat predictably) json that takes existing Python

objects and automatically converts them to JSON. Because JSON is supported and used on so many platforms, it is an

ideal choice for exchanging data.

When you add items such as the list of phone numbers, you are actually creating what is known as an embedded

document . This happens whenever you add complex content such as a list (or array , to use the term favored in

JSON). Generally speaking, there is also a logical distinction. For example, a Person document might have several

Address documents embedded inside it. Similarly, an Invoice document might have numerous LineItem documents

embedded inside it. Of course, the embedded Address document could also have its own embedded document that

contains phone numbers, for example.

Whether you choose to embed a particular document is determined when you decide how to store your

information. This is usually referred to as schema design . It might seem odd to refer to schema design when MongoDB is

considered a schemaless database. However, while MongoDB doesn't force you to create a schema or enforce one that

you create, you do still need to think about how your data fits together. We'll look at this in more depth in Chapter 3.

Adopting a Nonrelational Approach

Improving performance with a relational database is usually straightforward: you buy a bigger, faster server. And

this works great until you reach the point where there isn't a bigger server available to buy. At that point, the

only option is to spread out to two servers. This might sound easy, but it is a stumbling block for most databases.

For example, neither MySQL nor PostgresSQL can run a single database on two servers, where both servers can

both read and write data (often referred to as an active/active cluster ). And although Oracle can do this with its

impressive Real Application Clusters (RAC) architecture, you can expect to take out a mortgage if you want to use

that solution—implementing a RAC-based solution requires multiple servers, shared storage, and several

software licenses.

You might wonder why having an active/active cluster on two databases is so difficult. When you query your

database, the database has to find all the relevant data and link it all together. RDBMS solutions feature many

ingenious ways to improve performance, but they all rely on having a complete picture of the data available. And this

is where you hit a wall: this approach simply doesn't work when half the data is on another server.

Of course, you might have a small database that simply gets lots of requests, so you just need to share the

workload. Unfortunately, here you hit another wall. You need to ensure that data written to the first server is available

to the second server. And you face additional issues if updates are made on two separate masters simultaneously.

For example, you need to determine which update is the correct one. Another problem you can encounter: someone

might query the second server for information that has just been written to the first server, but that information hasn't

been updated yet on the second server. When you consider all these issues, it becomes easy to see why the Oracle

solution is so expensive—these problems are extremely hard to address.

MongoDB solves the active/active cluster problems in a very clever way—it avoids them completely. Recall

that MongoDB stores data in BSON documents, so the data is self-contained. That is, although similar documents

are stored together, individual documents aren't made up of relationships. This means that everything you need is

all in one place. Because queries in MongoDB look for specific keys and values in a document, this information can

be easily spread across as many servers as you have available. Each server checks the content it has and returns the

result. This effectively allows almost linear scalability and performance. As an added bonus, it doesn't even require

that you take out a new mortgage to pay for this functionality.

Admittedly, MongoDB does not offer master/master replication , in which two separate servers can both

accept write requests. However, it does have sharding , which allows data to split across multiple machines, with

each machine responsible for updating different parts of the dataset. The benefit of this design is that, while some

solutions allow two master databases, MongoDB can potentially scale to hundreds of machines as easily as it can

run on two.

The Definitive Guide to MongoDB

Search WWH ::

Custom Search

Home