Database Reference
In-Depth Information
Sharding
In addition to replication, MongoDB also supports auto-balancing sharding
for data being stored in collections (except capped collections). The
implementation is quite similar to the approach used by twemproxy to
cluster Redis, except that data may move between shards over time.
MongoDB's sharding implementation is designed to transition smoothly
from an unsharded environment to a sharded one. The usual
recommendation, when starting to use MongoDB, is to begin with an
unsharded dataset in a normal Replica Set and then add shards as the
dataset grows to the point that the working set starts to get close to available
RAM on a single server.
The reason this works is because MongoDB shards are simply Replica Sets.
To implement the sharding itself, MongoDB introduces an auxiliary server
called mongos . In combination with a configuration server, which is simply
another MongoDB Replica Set, the mongos server acts as an intermediary
between applications and the database shards. It is responsible for
distributing queries and collating results for return to a client. In this sense,
mongos is very similar to the twemproxy server used to shard Redis and
Memcached instances. A sharded MongoDB cluster may have any number
of mongos servers running at any given time. A common strategy is to
run a mongos process on each application server to simplify application
configuration.
The mongos server differs from twemproxy in that it is also responsible
for cluster rebalancing. Unlike many sharding approaches, which rely on
a predetermined shard key, MongoDB attempts to automatically balance
data and load across the cluster. This is handled by a balancer process,
initiated by one of the mongos servers attached to the cluster. Normally this
process is automatic and started when mongos detects sufficient imbalance
in cluster resources. However, because rebalancing the cluster places a load
on the system, it is possible to configure the balancer to only run during
specific time windows when load is expected to be low (for example, early
morning hours for most Internet applications).
To start using sharding in MongoDB, first a set of configuration servers
must be established. These configuration servers play the same role as
ZooKeeper plays for Kafka, providing configuration metadata. Most
importantly, it will contain information about how keys are distributed
Search WWH ::




Custom Search