Database Reference
In-Depth Information
{ _id: ObjectId("4d750a90c35169d10fc8c982"),
domain: "org.mongodb",
url: "/downloads",
period: "2011-12"
}
The simplest shard key for a sharded collection containing documents like this would
consist of each page's domain followed by its url: {domain: 1, url: 1} . All pages from
a given domain would generally live on a single shard, but the outlier domains with
massive numbers of pages would still be split across shards when necessary.
9.5
Sharding in production
When deploying a shard cluster to production, you're presented with a number of
choices and challenges. Here I describe a couple of recommended deployment
topologies and provide some answers to common deployment questions. We'll then
consider matters of server administration, including monitoring, backups, failover,
and recovery.
9.5.1
Deployment and configuration
Deployment and configuration are hard to get right the first time around. The follow-
ing are some guidelines for organizing the cluster and configuring with ease.
D EPLOYMENT TOPOLOGIES
To l a u n c h t h e s a m pl e M o n g o D B s h a r d c l u s t e r, y o u h a d t o s t a r t a t o t a l o f n i n e p r o -
cesses (three mongod s for each replica set, plus three config servers). That's a poten-
tially frightening number. First-time users might assume that running a two-shard
cluster in production would require nine separate machines. Fortunately, many fewer
are needed. You can see why by looking at the expected resource requirements for
each component of the cluster.
Consider first the replica sets. Each replicating member contains a complete copy
of the data for its shard and may run as a primary or secondary node. These processes
will always require enough disk space to store their copy of the data, and enough RAM
to serve that data efficiently. Thus replicating mongod s are the most resource-intensive
processes in a shard cluster and must be given their own machines.
What about replica set arbiters? These processes store replica set config data only,
which is kept in a single document. Hence, arbiters incur little overhead and certainly
don't need their own servers.
Next are the config servers. These also store a relatively small amount of data. For
instance, the data on the config servers managing the sample replica set totaled only
about 30 KB . If you assume that this data will grow linearly with shard cluster data size,
then a 1 TB shard cluster might swell the config servers' data size to a mere 30 MB . 14
This means that config servers don't necessarily need their own machines, either. But
14
That's a highly conservative estimate. The real value will likely be far smaller.
Search WWH ::




Custom Search