Information Technology Reference
In-Depth Information
17.2.4 Central versus Regional Collectors
Some monitoring systems scale to global size by having a collector run in each region and
relaythedatacollectedtothemainmonitoringsystem.Thistypeofcollectoriscalleda re-
mote monitoring station or aggregator .Anaggregatormightbeplacedineachdatacenter
orgeographicalregion.Itmayreceivemetricsbypushorpull,anditgenerallyconsolidates
the information and transmits it to the main system in a more efficient manner. This ap-
proach may be used to scale a system globally, saving bandwidth between each datacenter.
Alternatively, it may be done to scale a system up; each aggregator may be able to handle
a certain number of devices.
17.3 Analysis and Computation
Once the data has been collected, it can be used and interpreted. Analysis extracts meaning
from raw data. Analysis is the most important component because it produces the results
that justify having a monitoring system in the first place.
Real-time analysisexaminesthedataasitiscollected.Itisgenerallythemostcomputa-
tionally expensive analysis andisreserved forcritical tasks suchasdetermining conditions
where someone should be alerted. To do this efficiently, monitoring systems tee the data as
it is collected and send one copy to the storage system and another to the real-time analys-
is system. Alternatively, storage systems may hold copies of recently collected metrics in
RAM. For example, by keeping the last hour's worth of metrics in RAM and constructing
allalertingrulestorefertoonlythelasthourofhistory,hundredsorthousandsofalertrules
can be processed efficiently.
Typically real-time analysis involves dozens or hundreds of alert rules that are simul-
taneously processed to find exceptional situations, called triggers . Sample triggers include
if a service is down, if HTTP responses from a server exceed n ms for x minutes, or if the
amount of free disk space drops below m gigabytes. The real-time analysis includes a lan-
guage for writing formulas that describe these situations.
This analysis may also detect situations that are not so critical as to require immediate
attention, but if left unattended could create a more significant problem. Such situations
should generate tickets rather than alerts. See Section 14.1.7 for problem classifications.
Alerts should be reserved for problems that do require immediate attention. When an alert
is triggered, it prompts the alerting and escalation manager, described later in this chapter,
to take action.
Short-term analysis examines data that was collected in the last day, week, or month.
Generally dashboards fit into this category. They are updated infrequently, often every few
minutes or on demand when someone calls up the specific web page. Short-term analysis
usually queries the on-disk copy of the stored metrics. Near-term analysis is also used to
Search WWH ::




Custom Search