Database Reference
In-Depth Information
When you use the count() function at this juncture, the number of unique items won't be correct:
>>> collection.find({}).count()
4
Instead, you can use the distinct() function to ensure that any duplicates are ignored:
>>> collection.distinct("ItemNumber")
[u'1234EXD', u'2345FDX', u'3456TFS']
Grouping Data with the Aggregation Framework
The aggregation framework is a great tool for calculating aggregated values without needing to use MapReduce.
Although MapReduce is very powerful—and available to the PyMongo driver—the aggregation framework can get
most jobs done just as well, with better performance. To demonstrate this, one of the aggregate() function's most
powerful pipeline operators, $group , will be used to group the previously added documents by their tags, and perform
a count on it using the $sum aggregation expression. Let's look at an example:
>>> collection.aggregate([
... {'$unwind' : '$Tags'},
... {'$group' : {'_id' : '$Tags', 'Totals' : {'$sum' : 1}}}
... ])
First, the aggregate() function creates a stream of tag documents from the document's '$Tags ' array (note the
mandatory $ in its name) using the $unwind pipeline operator. Next, the $group pipeline operator is called, creating a
separate row for every unique tag using its value as its '_id' and the total count—using the $group 's $sum expression
to calculate the 'Totals' value. The resulting output looks like this:
{
u'ok': 1.0,
u'result': [
{u'_id': u'Laptop', u'Totals': 4},
{u'_id': u'In Use', u'Totals': 3},
{u'_id': u'Development', u'Totals': 3},
{u'_id': u'Storage', u'Totals': 1},
{u'_id': u'Not used', u'Totals': 1}
]
}
The output returns exactly the information that was requested. However, what if we wish to sort the output by its
' Totals' ? This can be achieved by simply adding another pipeline operator, $sort . Before doing so, however, we need
to import the SON module:
>>> from bson.son import SON
Now we can sort the results in descending order ( -1 ) based on the 'Totals' value as shown here:
>>> collection.aggregate([
... {'$unwind' : '$Tags'},
... {'$group' : {'_id' : '$Tags', 'Totals' : {'$sum' : 1}}},
... {'$sort' : SON([('Totals', -1)])}
... ])
 
Search WWH ::




Custom Search