Sharding - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

MongoDB has a unique method for sharding, where a MongoS routing process manages the splitting of the

data and the routing of requests to the required shard server. If a query requires data from multiple shards, then the

MongoS will manage the process of merging the data obtained from each shard back into a single cursor.

This feature, more than any other, is what earns MongoDB its stripes as a cloud or web-oriented database.

Analyzing a Simple Sharding Scenario

Let's assume you want to implement a simple sharding solution for a fictitious Gaelic social network. Figure 12-1

shows a simplified representation of how this application could be sharded.

Users A-C

....

Users M-O

Client

....

Users X-Z

Figure 12-1. Simple sharding of a User collection

There are a number of problems with this simplified view of our application. Let's look at the most obvious ones.

First, if your Gaelic network is targeted at the Irish and Scottish communities around the world, then the database

will have a large number of names that start with Mac and Mc (MacDonald, McDougal, and so on) for the Scottish

population and O' (O'Reilly, O'Conner, and so on) for the Irish population. Thus, using the simple sharding key based

on the first letter of the last name will place an undue number of user records on the shard that supports the letter

range “M-O.” Similarly, the shard that supports the letter range “X-Z” will perform very little work at all.

An important characteristic of a sharding system is that it must ensure that the data is spread evenly across

the available set of shard servers. This prevents hotspots that can affect the overall performance of the cluster from

developing. Let's call this Requirement 1: The ability to distribute data evenly across all shards.

Another thing to keep in mind: when you split your dataset across multiple servers, you effectively increase your

dataset's vulnerability to hardware failure. That is, you increase the chance that a single server failure will affect the

availability of your data as you add servers. Again, an important characteristic of a reliable sharding system is that—

like a RAID system commonly used with disk drives—it stores each piece of data on more than one server, and it can

tolerate individual shard servers becoming unavailable. Let's call this Requirement 2: The ability to store shard data in

a fault-tolerant fashion .

Search WWH ::

Custom Search

Home