Database Reference
In-Depth Information
This version of the example improves on things a bit more. Now you can clearly see what each number is for.
JSON is extremely expressive, and, although it's quite easy to write JSON by hand, it is usually generated automatically
in software. For example, Python includes a module called (somewhat predictably) json that takes existing Python
objects and automatically converts them to JSON. Because JSON is supported and used on so many platforms, it is an
ideal choice for exchanging data.
When you add items such as the list of phone numbers, you are actually creating what is known as an embedded
document . This happens whenever you add complex content such as a list (or array , to use the term favored in
JSON). Generally speaking, there is also a logical distinction. For example, a Person document might have several
Address documents embedded inside it. Similarly, an Invoice document might have numerous LineItem documents
embedded inside it. Of course, the embedded Address document could also have its own embedded document that
contains phone numbers, for example.
Whether you choose to embed a particular document is determined when you decide how to store your
information. This is usually referred to as schema design . It might seem odd to refer to schema design when MongoDB is
considered a schemaless database. However, while MongoDB doesn't force you to create a schema or enforce one that
you create, you do still need to think about how your data fits together. We'll look at this in more depth in Chapter 3.
Adopting a Nonrelational Approach
Improving performance with a relational database is usually straightforward: you buy a bigger, faster server. And
this works great until you reach the point where there isn't a bigger server available to buy. At that point, the
only option is to spread out to two servers. This might sound easy, but it is a stumbling block for most databases.
For example, neither MySQL nor PostgresSQL can run a single database on two servers, where both servers can
both read and write data (often referred to as an active/active cluster ). And although Oracle can do this with its
impressive Real Application Clusters (RAC) architecture, you can expect to take out a mortgage if you want to use
that solution—implementing a RAC-based solution requires multiple servers, shared storage, and several
software licenses.
You might wonder why having an active/active cluster on two databases is so difficult. When you query your
database, the database has to find all the relevant data and link it all together. RDBMS solutions feature many
ingenious ways to improve performance, but they all rely on having a complete picture of the data available. And this
is where you hit a wall: this approach simply doesn't work when half the data is on another server.
Of course, you might have a small database that simply gets lots of requests, so you just need to share the
workload. Unfortunately, here you hit another wall. You need to ensure that data written to the first server is available
to the second server. And you face additional issues if updates are made on two separate masters simultaneously.
For example, you need to determine which update is the correct one. Another problem you can encounter: someone
might query the second server for information that has just been written to the first server, but that information hasn't
been updated yet on the second server. When you consider all these issues, it becomes easy to see why the Oracle
solution is so expensive—these problems are extremely hard to address.
MongoDB solves the active/active cluster problems in a very clever way—it avoids them completely. Recall
that MongoDB stores data in BSON documents, so the data is self-contained. That is, although similar documents
are stored together, individual documents aren't made up of relationships. This means that everything you need is
all in one place. Because queries in MongoDB look for specific keys and values in a document, this information can
be easily spread across as many servers as you have available. Each server checks the content it has and returns the
result. This effectively allows almost linear scalability and performance. As an added bonus, it doesn't even require
that you take out a new mortgage to pay for this functionality.
Admittedly, MongoDB does not offer master/master replication , in which two separate servers can both
accept write requests. However, it does have sharding , which allows data to split across multiple machines, with
each machine responsible for updating different parts of the dataset. The benefit of this design is that, while some
solutions allow two master databases, MongoDB can potentially scale to hundreds of machines as easily as it can
run on two.
 
Search WWH ::




Custom Search