Sharding - The Definitive Guide to MongoDB

Database Reference

In-Depth Information

chunks:

shard0000 2

shard0001 3

{ "testkey" : { "$minKey" : 1 } } -->> { "testkey" : 0 } on : shard0000 Timestamp(4, 0)

{ "testkey" : 0 } -->> { "testkey" : 14860 } on : shard0000 Timestamp(3, 1)

{ "testkey" : 14860 } -->> { "testkey" : 45477 } on : shard0001 Timestamp(4, 1)

{ "testkey" : 45477 } -->> { "testkey" : 76041 } on : shard0001 Timestamp(3, 4)

{ "testkey" : 76041 } -->> { "testkey" : { "$maxKey" : 1 } } on : shard0001

Timestamp(3, 5)

This output lists the shard servers, the configuration of each sharded database/collection, and each chunk in

the sharded dataset. Because you used a small chunkSize value to simulate a larger sharding setup, this report lists

a lot of chunks. An important piece of information that can be obtained from this listing is the range of sharding keys

associated with each chunk. The output also shows which shard server the specific chunks are stored on. You can use

the output returned by this command as the basis for a tool to analyze the distribution of a shard server's keys and

chunks. For example, you might use this data to determine whether there is any clumping of data in the dataset.

Using Replica Sets to Implement Shards

The examples you have seen so far rely on a single mongod instance to implement each shard. In Chapter 11, you learned how

to create replica sets, which are clusters of mongod instances working together to provide redundant and fail-safe storage.

When adding shards to the sharded cluster, you can provide the name of a replica set and the address of a member

of that replica set, and that shard will be instanced on each of the replica set members. Mongos will track which

instance is the primary server for the replica set; it will also make sure that all shard writes are made to that instance.

Combining sharding and replica sets enables you to create high-performance, highly reliable clusters that

can tolerate multi-machine failure. It also enables you to maximize the performance and availability of cheap,

commodity-class hardware.

■ the ability to use replica sets as a storage mechanism for shards satisfies “requirement 2: the ability to store

shard data in a fault-tolerant fashion.”

Note

The Balancer

We've previously discussed how MongoDB can automatically keep your workload distributed among all the shards

in your cluster. While you may think that this is done via some form of patented MongoDB-Magic, that's not the case.

Your MongoS process has an element within it called the balancer , which moves the logical chunks of data around

within your cluster to ensure that they are evenly distributed among all your shards. The balancer speaks to the

shards and tells them to migrate data from one shard to another. You can see the distribution of chunks within the sh.

status() output in the following example. You can see that my data is partitioned with two chunks on shard0000 and

three on shard0001 .

{ "_id" : "testdb", "partitioned" : true, "primary" : "shard0000" }

testdb.testcollection

shard key: { "testkey" : 1 }

chunks:

shard0000 2

shard0001 3

Search WWH ::

Custom Search

Home