Information Technology Reference
In-Depth Information
Given these axes, we can describe the primary users of monitoring information as fol-
lows:
Operational Health is typical monitoring, where exceptional situations are detected and
alerts are generated. It is the most demanding use case. The resolution and latency must be
sufficient to detect problems and respond to them within an SLA. This usually demands
up-to-dateaccesstoallmetrics,real-timecomputationforhigh-speedanalysis,andreliable
alerting. The storage system must be high speed and high volume at the same time. In fact,
this kind of monitoring stresses every part of the monitoring infrastructure.
Quality Assurance usually involves medium- or long-term analysis for specific quality
metrics such as variability. For example, some queries should always take approximately
the same amount of time with little variation. Quality assurance detects this kind of variab-
ilityjustasanautoassemblylinequalityassuranceteamlooksfordefectsandunacceptable
variations in the product being built. For this reason, Quality Assurance needs high-resolu-
tiondatabutlatencyisnotcriticalsincethedataisoftenprocessedinbatchesafterthefact.
Quality Assurance also includes information required when finding and fixing bugs,
such as debug logs, process traces, stack traces, coredumps, and profiler output.
Capacity Planning (CP) is the process of predicting resource needs in the future. These
predictions require coarse metrics such as the current number of machines, amount of net-
workbandwidthused,costperuser,andmachineutilizationandefficiency,aswellasalert-
ing when resources are running low. CP is also concerned with how resource use changes
Search WWH ::




Custom Search