Operational Intelligence - MongoDB Applied Design Patterns

Database Reference

In-Depth Information

Figure 4-5. Hierarchical aggregation

Schema Design

When designing the schema for event storage, it's important to track the events included in

the aggregation and events that are not yet included.

Ifyoucanbatchyourinsertsintothe events collection,youcanuseanautoincrementprimary

key by using the find_and_modify command to generate the _id values, as shown here:

>>>

>>> obj = db . my_sequence . find_and_modify (

...

query = { '_id' : 0 },

...

update = { '$inc' : { 'inc' : 50 }}

...

upsert = True ,

...

new = True )

>>>

>>> batch_of_ids = range ( obj [ 'inc' ] - 50 , obj [ 'inc' ])

However, in many cases you can simply include a timestamp with each event that you can use

to distinguish processed events from unprocessed events.

This example assumes that you are calculating average session length for logged-in users on

a website. The events will have the following form:

{

"userid" : "rick" ,

"ts" : ISODate ( '2010-10-10T14:17:22Z' ),

"length" : 95

}

The operations described here will calculate total and average session times for each user at

the hour, day, week, month, and year. For each aggregation, we'll store the number of sessions

so that MongoDB can incrementally recompute the average session times. The aggregate doc-

ument will resemble the following:

{

_id : { u : "rick" , d : ISODate ( "2010-10-10T14:00:00Z" ) },

value : {

ts : ISODate ( '2010-10-10T15:01:00Z' ),

Search WWH ::

Custom Search

Home