Database Reference
In-Depth Information
Now you know how to start and stop the balancer, and how to check what the balancer is doing at a given point.
You will also want to be able to set a window when the balancer will be active. As an example, let's set our balancer
to run between 8PM and 6AM, which lets it run overnight when our cluster is (hypothetically) less active. To do this,
we update the balancer settings document from before, as it controls whether the balancer is running. The exchange
looks like this:
> use config
switched to db config
>db.settings.update({_id:"balancer"}, { $set : { activeWindow : { start : "20:00", stop : "6:00" } }
}
And that will do it; your balancer document will now have an activeWindow that will start it at 8PM and stop it at
6AM. You should now be able to start and stop the balancer, confirm its state and when it was last running, and finally
set a time window in which the balancer is active.
Hashed Shard Keys
Earlier we discussed how important it is to pick the correct shard key. If you pick the wrong shard key, you can cause
all kinds of performance problems. Take, for example, sharding on _id , which is an ever-increasing value. Each insert
you make will be sent to the shard in your set that currently holds the highest _id value. As each new insert is the
“largest” value that has been inserted, you will always be inserting data to the same place. This means you will have
one “hot” shard in your cluster that is receiving all inserts and has all documents being migrated from it to the other
shards—not very efficient.
To people solve this problem, MongoDB 2.4 introduced a new feature—hashed shard keys! A hashed shard
key will create a hash for each of the values on a given field and then use these hashes to perform the chunking and
sharding operations. This allows you to take an increasing value such as an _id field and generate a hash for each
given _id value, which will give randomness to values. Adding this level of randomness should normally allow you
to distribute writes evenly to all shards. The cost, however, is that you'll have random reads as well, which can be a
performance penalty if you wish to perform operations over a range of documents. For this reason hashed sharding
may be inefficient when compared with a user-selected shard key under certain workloads.
Because of the way hashing is implemented, there are some limitations when you shard on floating-point
(decimal) numbers, which mean that values such as 2.3, 2.4, and 2.9 will become the same hashed value.
Note
So, to create a hashed shard we simply run the shardCollection and create a "hashed" index!
sh.shardCollection( " testdb.testhashed", { _id: "hashed" } )
And that's it! You have now created a hashed shard key, which will hash the incoming _id values in order to
distribute your data in a more “random” nature. Now, with all this in mind, some of you may be saying - why not
always use a hashed shard key?
Good question; and the answer is that sharding is just one of “those” dark arts. The optimum shard key is one that
allows your writes to be distributed well over a number of shards, so that the writes are effectively parallel. It is also a
key that allows you to group so that writes go to only one or a limited number of shards, and it must allow you to make
more effective use of the indexes held on the individual shards. All of those factors will be determined by your use
case, what you are storing, and how you are retrieving it.
 
 
Search WWH ::




Custom Search