Database Reference
In-Depth Information
▪ MongoDB can effectively reuse space freed by removing entire collections without lead-
ing to data fragmentation.
Nevertheless,thisoperationmayincreasesomecomplexityforqueries,ifanyofyouranalyses
depend on events that may reside in the current and previous collection. For most real-time
data-collection systems, this approach is ideal.
Multiple databases
Strategy: Rotate databases rather than collections, as was done in Multiple collections, single
database .
While this significantly increases application complexity for insertions and queries, when you
drop old databases MongoDB will return disk space to the filesystem. This approach makes
the most sense in scenarios where your event insertion rates and/or your data retention rates
were extremely variable.
For example, if you are performing a large backfill of event data and want to make sure that
the entire set of event data for 90 days is available during the backfill, and during normal op-
erations you only need 30 days of event data, you might consider using multiple databases.
Pre-Aggregated Reports
Although getting the event and log data into MongoDB efficiently and querying these log
records is somewhat useful, higher-level aggregation is often much more useful in turning
raw data into actionable information. In this section, we'll explore techniques to calculate and
store pre-aggregated (or pre-canned) reports in MongoDB using incremental updates.
Solution Overview
This section outlines the basic patterns and principles for using MongoDB as an engine for
collecting and processing events in real time for use in generating up-to-the-minute or up-to-
the-second reports. We make the following assumptions about real-time analytics:
▪ You require up-to-the-minute data, or up-to-the-second if possible.
▪ The queries for ranges of data (by time) must be as fast as possible.
▪ Servers generating events that need to be aggregated have access to the MongoDB in-
stance.
Search WWH ::




Custom Search