Database Reference
In-Depth Information
per user document, the collection is roughly 1 GB and can easily be served by a single
machine. But the spreadsheets collection is a different story. If you assume that each
user owns an average of 50 spreadsheets, and that each averages 50 KB in size, then
you're talking about a 1 TB spreadsheets collection. If this is a highly active applica-
tion, then you'll want to keep the data in RAM . To keep this data in RAM and distribute
reads and writes, you must shard the collection. That's where sharding comes into play.
Sharding a collection
MongoDB's sharding is range-based. This means that every document in a sharded
collection must fall within some range of values for a given key. MongoDB uses a so-
called shard key to place each document within one of these ranges. 3 You can under-
stand this better by looking at a sample document from the theoretical spreadsheet
management application:
{
_id: ObjectId("4d6e9b89b600c2c196442c21")
filename: "spreadsheet-1",
updated_at: ISODate("2011-03-02T19:22:54.845Z"),
username: "banks",
data: "raw document data"
}
When you shard this collection, you must declare one or more of these fields as the
shard key. If you choose _id , then documents will be distributed based on ranges of
object ID s. But for reasons that will become clear later, you're going to declare a com-
pound shard key on username and _id ; therefore, each range will usually represent
some series of user names.
You're now in a position to understand the concept of a chunk . A chunk is a contig-
uous range of shard key values located on a single shard. As an example, you can
imagine the docs collection being divided across two shards, A and B, into the chunks
you see in table 9.1. Each chunk's range is marked by a start and end value.
A cursory glance at this table reveals one of the main, sometimes counterintuitive,
properties of chunks: that although each individual chunk represents a contiguous
range of data, those ranges can appear on any shard.
A second important point about chunks is
that they're not physical but logical . In other
words, a chunk doesn't represent a contiguous
series of document on disk. Rather, to say that
the chunk beginning with harris and ending with
norris exists on shard A is simply to say that any
document with a shard key falling within this
range can be found in shard A's docs collection.
This implies nothing about the arrangement of
those documents within the collection.
Table 9.1
Chunks and shards
Start
End
Shard
-
abbot
B
abbot
dayton
A
dayton
harris
B
harris
norris
A
norris
B
3
Alternate distributed databases might use the terms partition key or distribution key instead.
 
Search WWH ::




Custom Search