Databases Reference
In-Depth Information
APIs that hide it from the user. If not, applications can use SON instances instead of
dictionaries to ensure their documents maintain key order.
MongoDB for Real-Time Analytics
MongoDB is a great tool for tracking metrics in real time for a couple of reasons:
• Upsert operations (see Chapter 3 ) allow us to send a single message to either create
a new tracking document or increment the counters on an existing document.
• The upsert we send does not wait for a response; it's fire-and-forget. This allows
our application code to avoid blocking on each analytics update. We don't need
to wait and see whether the operation is successful, because an error in analytics
code wouldn't get reported to a user anyway.
• We can use an $inc update to increment a counter without having to do a separate
query and update operation. We also eliminate any contention issues if multiple
updates are happening simultaneously.
• MongoDB's update performance is very good, so doing one or more updates per
request for analytics is reasonable.
Schema
In our example we will be tracking page views for our site, with hourly roll-ups. We'll
track both total page views as well as page views for each individual URL. The goal is
to end up with a collection, hourly , containing documents like this:
{ "hour" : "Tue Jun 15 2010 9:00:00 GMT-0400 (EDT)", "url" : "/foo", "views" : 5 }
{ "hour" : "Tue Jun 15 2010 9:00:00 GMT-0400 (EDT)", "url" : "/bar", "views" : 5 }
{ "hour" : "Tue Jun 15 2010 10:00:00 GMT-0400 (EDT)", "url" : "/", "views" : 12 }
{ "hour" : "Tue Jun 15 2010 10:00:00 GMT-0400 (EDT)", "url" : "/bar", "views" : 3 }
{ "hour" : "Tue Jun 15 2010 10:00:00 GMT-0400 (EDT)", "url" : "/foo", "views" : 10 }
{ "hour" : "Tue Jun 15 2010 11:00:00 GMT-0400 (EDT)", "url" : "/foo", "views" : 21 }
{ "hour" : "Tue Jun 15 2010 11:00:00 GMT-0400 (EDT)", "url" : "/", "views" : 3 }
...
Each document represents all of the page views for a single URL in a given hour. If a
URL gets no page views in an hour, there is no document for it. To track total page
views for the entire site, we'll use a separate collection, hourly_totals , which has the
following documents:
{ "hour" : "Tue Jun 15 2010 9:00:00 GMT-0400 (EDT)", "views" : 10 }
{ "hour" : "Tue Jun 15 2010 10:00:00 GMT-0400 (EDT)", "views" : 25 }
{ "hour" : "Tue Jun 15 2010 11:00:00 GMT-0400 (EDT)", "views" : 24 }
...
The difference here is just that we don't need a "url" key, because we're doing site-
wide tracking. If our entire site doesn't get any page views during an hour, there will
be no document for that hour.
 
Search WWH ::




Custom Search