Database Reference
In-Depth Information
Sharding Concerns
If you need to shard the data for this system, the
_id
field is a reasonable choice for shard key
since most updates use the
_id
field in their spec, allowing
mongos
to route each update to a
single
mongod
process. There are a couple of potential drawbacks with using
_id
, however:
▪ If the cart collection's
_id
is an increasing value such as an
ObjectId()
, all new carts end
up on a single shard.
▪ Cart expiration and inventory adjustment require update operations and queries to broad-
cast to all shards when using
_id
as a shard key.
It's possible to mitigate the first problem at least by using a pseudorandom value for
_id
when
creating a cart. A reasonable approach would be the following:
import
import
hashlib
hashlib
import
import
bson
bson
def
def
new_cart
():
object_id
=
bson
.
ObjectId
()
cart_id
=
hashlib
.
md5
(
str
(
object_id
))
.
hexdigest
()
return
return
cart_id
We're creating a
bson.ObjectId()
to get a unique value to use in our hash. Note
that since
ObjectId
uses the current timestamp as its most significant bits, it's not
an appropriate choice for shard key.
Now we randomize the
object_id
, creating a string that is
extremely likely
to be
unique in our system.
To actually perform the sharding, we'd execute the following commands:
>>>
>>>
db
.
command
(
'shardcollection'
,
'dbname.inventory'
...
...
'key'
: {
'_id'
:
1
} )
{ "collectionsharded" : "dbname.inventory", "ok" : 1 }
>>>
>>>
db
.
command
(
'shardcollection'
,
'dbname.cart'
)
...
...
'key'
: {
'_id'
:
1
} )
{ "collectionsharded" : "dbname.cart", "ok" : 1 }