Information Technology Reference
In-Depth Information
space, CPU, and network capacity from becoming limiting factors. The accuracy and pre-
cision of collected data should be monitored; one should monitor how often an attempt to
collect a measurement fails. The display of information must be faster than the average
person's attention span. The freshness of the data used to calculate a KPI should be mon-
itored; knowing the age of data used to calculate a KPI may be as important as the KPI
itself. KnowinghowKPIlatency istrending isimportant tounderstanding thehealth ofthe
monitoring system.
One meta-monitoring technique is to deploy a second monitoring system that monitors
the primary system. It should have as few common dependencies as possible. If different
software is used, it is less likely that the same bug will take down both systems. This tech-
nique, however, adds complexity, requires additional training, and necessitates mainten-
ance.
Another technique is to divide the network into two parts, each with its own monitoring
system. The two monitoring systems can also monitor each other. For example, a site with
twodatacenters mightdeployadifferentmonitoringsystemineachonethatmonitorslocal
machines. This saves on inter-datacenter bandwidth and removes the interconnection as a
source of failure.
With more than one datacenter, a similar arrangement can be used, with pairs of data-
centers monitoring each other, or each monitoring system monitoring another system in a
big circle.
16.6 Logs
Anotherwayofgainingvisibilityintothesystemisthroughanalysisofloggingdata.While
not directly related to monitoring, we mention this capability here because of the visibility
it brings. There are many kind of logs:
Web “Hit” Logs: Web servers generally log each HTTP access along with statist-
ics about performance, where the access came from, and if the access was a suc-
cess, error, redirect, and so on. This data can be used for a multitude of business
purposes: determining page generation times, analyzing where users come from,
tracking the paths users take through the system, and more. Technical operations
can use this data to analyze and improve page load time and latency.
API Logs: Logging each API call generally involves storing who made the call,
the input parameters, and output results (often summarized as a simple success or
error code). API logs can be useful for billing, security forensics, and feature usage
patterns. A team that is going to eliminate certain obsolete API calls can use logs
to determine which users will be affected, if any.
Search WWH ::




Custom Search