Databases Reference
In-Depth Information
of a blog where each post has tags. We want to find the most popular tag for each day.
We can group by day (again) and keep a count for each tag. This might look something
like this:
> db.posts.group({
... "key" : {"tags" : true},
... "initial" : {"tags" : {}},
... "$reduce" : function(doc, prev) {
... for (i in doc.tags) {
... if (doc.tags[i] in prev.tags) {
... prev.tags[doc.tags[i]]++;
... } else {
... prev.tags[doc.tags[i]] = 1;
... }
... }
... }})
This will return something like this:
[
{"day" : "2010/01/12", "tags" : {"nosql" : 4, "winter" : 10, "sledding" : 2}},
{"day" : "2010/01/13", "tags" : {"soda" : 5, "php" : 2}},
{"day" : "2010/01/14", "tags" : {"python" : 6, "winter" : 4, "nosql": 15}}
]
Then we could find the largest value in the "tags" document on the client side. How-
ever, sending the entire tags document for every day is a lot of extra overhead to send
to the client: an entire set of key/value pairs for each day, when all we want is a single
string. This is why group takes an optional "finalize" key. "finalize" can contain a
function that is run on each group once, right before the result is sent back to the client.
We can use a "finalize" function to trim out all of the cruft from our results:
> db.runCommand({"group" : {
... "ns" : "posts",
... "key" : {"tags" : true},
... "initial" : {"tags" : {}},
... "$reduce" : function(doc, prev) {
... for (i in doc.tags) {
... if (doc.tags[i] in prev.tags) {
... prev.tags[doc.tags[i]]++;
... } else {
... prev.tags[doc.tags[i]] = 1;
... }
... },
... "finalize" : function(prev) {
... var mostPopular = 0;
... for (i in prev.tags) {
... if (prev.tags[i] > mostPopular) {
... prev.tag = i;
... mostPopular = prev.tags[i];
... }
... }
... delete prev.tags
... }}})
 
Search WWH ::




Custom Search