Operational Intelligence - MongoDB Applied Design Patterns

Database Reference

In-Depth Information

last_run = cutoff

As long as we save and restore the last_run variable between aggregations, we can run these

aggregations as often as we like, since each aggregation operation is incremental (i.e., using

output mode 'reduce' ).

Sharding Concerns

When sharding, we need to ensure that we don't choose the incoming timestamp as a shard

key, but rather something that varies significantly in the most recent documents. In the pre-

vious example, we might consider using the userid as the most significant part of the shard

key.

To prevent a single, active user from creating a large chunk that MongoDB cannot split, we'll

use a compound shard key with username-timestamp on the events collection as follows:

>>>

>>> db . command ( 'shardcollection' , 'dbname.events' , {

...

... 'key' : { 'userid' : 1 , 'ts' : 1 } } )

{ "collectionsharded": "dbname.events", "ok" : 1 }

To shard the aggregated collections, we must use the _id field to work well with mapreduce ,

so we'll issue the following group of shard operations in the shell:

>>>

>>> db . command ( 'shardcollection' , 'dbname.stats.daily' , {

...

... 'key' : { '_id' : 1 } })

{ "collectionsharded": "dbname.stats.daily", "ok" : 1 }

>>>

>>> db . command ( 'shardcollection' , 'dbname.stats.weekly' , {

...

... 'key' : { '_id' : 1 } })

{ "collectionsharded": "dbname.stats.weekly", "ok" : 1 }

>>>

>>> db . command ( 'shardcollection' , 'dbname.stats.monthly' , {

...

... 'key' : { '_id' : 1 } })

{ "collectionsharded": "dbname.stats.monthly", "ok" : 1 }

>>>

>>> db . command ( 'shardcollection' , 'dbname.stats.yearly' , {

...

... 'key' : { '_id' : 1 } })

{ "collectionsharded": "dbname.stats.yearly", "ok" : 1 }

We also need to update the h_aggregate MapReduce wrapper to support sharded output by

adding 'sharded':True to the out argument. Our new h_aggregate now looks like this:

Search WWH ::

Custom Search

Home