Sharding - MongoDB in Action

Database Reference

In-Depth Information

per user document, the collection is roughly 1 GB and can easily be served by a single

machine. But the spreadsheets collection is a different story. If you assume that each

user owns an average of 50 spreadsheets, and that each averages 50 KB in size, then

you're talking about a 1 TB spreadsheets collection. If this is a highly active applica-

tion, then you'll want to keep the data in RAM . To keep this data in RAM and distribute

reads and writes, you must shard the collection. That's where sharding comes into play.

Sharding a collection

MongoDB's sharding is range-based. This means that every document in a sharded

collection must fall within some range of values for a given key. MongoDB uses a so-

called shard key to place each document within one of these ranges. 3 You can under-

stand this better by looking at a sample document from the theoretical spreadsheet

management application:

{

_id: ObjectId("4d6e9b89b600c2c196442c21")

filename: "spreadsheet-1",

updated_at: ISODate("2011-03-02T19:22:54.845Z"),

username: "banks",

data: "raw document data"

}

When you shard this collection, you must declare one or more of these fields as the

shard key. If you choose _id , then documents will be distributed based on ranges of

object ID s. But for reasons that will become clear later, you're going to declare a com-

pound shard key on username and _id ; therefore, each range will usually represent

some series of user names.

You're now in a position to understand the concept of a chunk . A chunk is a contig-

uous range of shard key values located on a single shard. As an example, you can

imagine the docs collection being divided across two shards, A and B, into the chunks

you see in table 9.1. Each chunk's range is marked by a start and end value.

A cursory glance at this table reveals one of the main, sometimes counterintuitive,

properties of chunks: that although each individual chunk represents a contiguous

range of data, those ranges can appear on any shard.

A second important point about chunks is

that they're not physical but logical . In other

words, a chunk doesn't represent a contiguous

series of document on disk. Rather, to say that

the chunk beginning with harris and ending with

norris exists on shard A is simply to say that any

document with a shard key falling within this

range can be found in shard A's docs collection.

This implies nothing about the arrangement of

those documents within the collection.

Table 9.1

Chunks and shards

Start

End

Shard

- ∞

abbot

B

abbot

dayton

A

dayton

harris

B

harris

norris

A

norris

∞

B

3

Alternate distributed databases might use the terms partition key or distribution key instead.

MongoDB in Action

Search WWH ::

Custom Search

Home