Database Reference
In-Depth Information
Figure 4-5. Hierarchical aggregation
Schema Design
When designing the schema for event storage, it's important to track the events included in
the aggregation and events that are not yet included.
Ifyoucanbatchyourinsertsintothe events collection,youcanuseanautoincrementprimary
key by using the find_and_modify command to generate the _id values, as shown here:
>>>
>>> obj = db . my_sequence . find_and_modify (
...
...
query = { '_id' : 0 },
...
update = { '$inc' : { 'inc' : 50 }}
...
upsert = True ,
...
new = True )
>>>
>>> batch_of_ids = range ( obj [ 'inc' ] - 50 , obj [ 'inc' ])
However, in many cases you can simply include a timestamp with each event that you can use
to distinguish processed events from unprocessed events.
This example assumes that you are calculating average session length for logged-in users on
a website. The events will have the following form:
{
"userid" : "rick" ,
"ts" : ISODate ( '2010-10-10T14:17:22Z' ),
"length" : 95
}
The operations described here will calculate total and average session times for each user at
the hour, day, week, month, and year. For each aggregation, we'll store the number of sessions
so that MongoDB can incrementally recompute the average session times. The aggregate doc-
ument will resemble the following:
{
_id : { u : "rick" , d : ISODate ( "2010-10-10T14:00:00Z" ) },
value : {
ts : ISODate ( '2010-10-10T15:01:00Z' ),
Search WWH ::




Custom Search