Information Technology Reference
In-Depth Information
not receiving notifications. The problem or outage is still really happening and any dash-
boards or displays will reflect that fact.
Alternatively, one can implement silences in the real-time analysis system. This ap-
proachiscalledan inhibit .Inthiscasethealertdoesnottriggerandasaresultnoalertsare
sent.Aninhibit canbeimplemented byamechanism whereanalert rulespecifies itshould
notbeevaluated (calculated) ifoneormoreprerequisite alertrulesarecurrentlytriggering.
Alternatively, the formula language could include a Boolean function that returns true or
falsedependingonwhetherthealertistriggering.Thisfunctionwouldbeusedtoshort-cir-
cuit the evaluation of the rule.
For example, an alert rule to warn of a high error rate might be inhibited if the database
in use is offline. There is no sense in getting alerted for something you can't do anything
about. Monitoring the database is handled by another rule, or by another team. The alert
rule might look like:
Click here to view code image
IF http-500-rate > 1% THEN
ALERT(error-rate-too-high)
UNLESS ACTIVE_ALERT(database-offline)
Silence Creation UI Advice
We learned the hard way that people are bad at doing math related to dates, times,
and time zones, especially when stress is high and alerts are blaring. It is also easy
to make mistakes when wildcards or regular expressions can be used to specify
what to silence. Therefore we recommend the following UI features:
• The default start time should be “now.”
• It should be possible to enter the end time as a duration in minutes, hours, or
days, as well as a specific time and date in any time zone.
• So that the user may check his or her work, the UI should display what will be
silenced and require confirmation before it is activated.
Inhibits can cause confusion if an outage is happening and, due to inhibits, the monitor-
ing system says that everything is fine. Teams that depend on the service will be confused
when their dashboards show it being up, yet their service is malfunctioning due to the out-
age. Therefore we recommend using inhibits sparingly and carefully.
The difference between a silence and an inhibit is very subtle. You silence an alert
when you know it would be erroneous to page someone based on some condition such as
Search WWH ::




Custom Search