Databases Reference
In-Depth Information
Tip #12: Compute aggregations as you go
Whenever possible, compute aggregations over time with $inc . For example, in “Tip
#7: Pre-populate anything you can” on page 8 , we have an analytics application with
stats by the minute and the hour. We can increment the hour stats at the same time
that we increment the minute ones.
If your aggregations need more munging (for example, finding the average number of
queries over the hour), store the data in the minutes field and then have an ongoing
batch process that computes averages from the latest minutes. As all of the information
necessary to compute the aggregation is stored in one document, this processing could
even be passed off to the client for newer (unaggregated) documents. Older documents
would have already been tallied by the batch job.
Tip #13: Write code to handle data integrity issues
Given MongoDB's schemaless nature and the advantages to denormalizing, you'll need
to keep your data consistent in your application.
Many ODMs have ways of enforcing consistent schemas to various levels of strictness.
However, there are also the consistency issues brought up above: data inconsistencies
caused by system failures ( “Tip #1: Duplicate data for speed, reference data for integ-
rity” on page 1 ) and limitations of MongoDB's updates ( “Tip #10: Design documents
to be self-sufficient” on page 12 ). For these types of inconsistencies, you'll need to
actually write a script that will check your data.
If you follow the tips in this chapter, you might end up with quite a few cron jobs,
depending on your application. For example, you might have:
Consistency fixer
Check computations and duplicate data to make sure that everyone has consistent
values.
Pre-populator
Create documents that will be needed in the future.
Aggregator
Keep inline aggregations up-to-date.
Other useful scripts (not strictly related to this chapter) might be:
Schema checker
Make sure the set of documents currently being used all have a certain set of fields,
either correcting them automatically or notifying you about incorrect ones.
Backup job
fsync , lock, and dump your database at regular intervals.
 
Search WWH ::




Custom Search