Database Reference
In-Depth Information
Sorting is not technically required; the purpose is to group documents with the same
key together. Sorting is just one way to do this.
Finalize is not required, but can be useful for computing things like mean values
given a count and sum computed by the other parts of MapReduce.
MongoDB provides several different options of how to store your output data. In
this code, we're mimicking the output mode of replace .
The nice thing about this algorithm is that each of the phases can be run in parallel. In
MongoDB, this benefit is somewhat limited by the presence, as of version 2.2, of a global
JavaScript interpreter lock that forces all JavaScript in a single MongoDB process to run
serially. Sharding allows you to get back some of this performance, but the full benefits of
MapReduce still await the removal of the JavaScript lock from MongoDB.
Operations
This section assumes that all events exist in the events collection and have a timestamp. The
operations are to aggregate from the events collection into the smallest aggregate—hourly
totals—and then aggregate from the hourly totals into coarser granularity levels. In all cases,
these operations will store aggregation time as a last_run variable.
Creating hourly views from event collections
To do our lowest-level aggregation, we need to first create a map function, as shown here:
mapf_hour = bson . Code ( '''function() {
var key = {
u: this.userid,
d: new Date(
this.ts.getFullYear(),
this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0);
emit(
key,
{
total: this.length,
count: 1,
 
 
 
Search WWH ::




Custom Search