Database Reference
In-Depth Information
MongoDB has a unique method for sharding, where a MongoS routing process manages the splitting of the
data and the routing of requests to the required shard server. If a query requires data from multiple shards, then the
MongoS will manage the process of merging the data obtained from each shard back into a single cursor.
This feature, more than any other, is what earns MongoDB its stripes as a cloud or web-oriented database.
Analyzing a Simple Sharding Scenario
Let's assume you want to implement a simple sharding solution for a fictitious Gaelic social network. Figure 12-1
shows a simplified representation of how this application could be sharded.
Users A-C
....
Users M-O
Client
....
Users X-Z
Figure 12-1. Simple sharding of a User collection
There are a number of problems with this simplified view of our application. Let's look at the most obvious ones.
First, if your Gaelic network is targeted at the Irish and Scottish communities around the world, then the database
will have a large number of names that start with Mac and Mc (MacDonald, McDougal, and so on) for the Scottish
population and O' (O'Reilly, O'Conner, and so on) for the Irish population. Thus, using the simple sharding key based
on the first letter of the last name will place an undue number of user records on the shard that supports the letter
range “M-O.” Similarly, the shard that supports the letter range “X-Z” will perform very little work at all.
An important characteristic of a sharding system is that it must ensure that the data is spread evenly across
the available set of shard servers. This prevents hotspots that can affect the overall performance of the cluster from
developing. Let's call this Requirement 1: The ability to distribute data evenly across all shards.
Another thing to keep in mind: when you split your dataset across multiple servers, you effectively increase your
dataset's vulnerability to hardware failure. That is, you increase the chance that a single server failure will affect the
availability of your data as you add servers. Again, an important characteristic of a reliable sharding system is that—
like a RAID system commonly used with disk drives—it stores each piece of data on more than one server, and it can
tolerate individual shard servers becoming unavailable. Let's call this Requirement 2: The ability to store shard data in
a fault-tolerant fashion .
 
 
 
 
 
 
 
Search WWH ::




Custom Search