Information Technology Reference
In-Depth Information
The analysis system extracts meaning from the data. There may be many different ana-
lysis systems, each providing services such as anomaly detection, forecasting, or data min-
ing. Some analysis occurs in real time, happening as the data is gathered. Short-term ana-
lysis focuses on recent data or provides the random access needed by specific applications
such as dashboards. Long-term analysis examines large spans of data to detect trends over
many years. It is often done in batch mode, storing intermediate results for later use.
Alerting and escalation systems reach out to find people when manual intervention is
needed,findingasubstitutewhenapersondoesn'trespondwithinacertainamountoftime.
Visualization systems provide graphs and dashboards. They can combine and transform
data and do operations such as calculating percentiles, building histograms, and determin-
ing stack ranks.
All of this is tied together by a configuration manager that directs all of the other com-
ponentsintheirwork.Changestoconfigurationscanbedistributedinmanyways,oftenby
distributing configuration files or more dynamic systems such as ZooKeeper.
Whenmonitoringsystemsaremultitenant,weempowerindividualserviceteamstocon-
trol their own monitoring configurations. They benefit from centralized components, free-
ing them from having to worry about capacity planning and other operational duties.
When all the components work together, we have a monitoring system that is scalable,
reliable, and functional.
Exercises
1. What are the components of the monitoring system?
2. Pick three components of the monitoring system and describe them in detail.
3. Do all monitoring systems have all the components described in this chapter? Give
examples of why components may be optional.
4. What is a pager storm, and what are the ways to deal with one?
5. Research the JSON format for representing data. Design a JSON format for the
collection of metrics.
6. Describe the monitoring system in use in your organization or one you've had ex-
perience with in the past. How is it used? What does it monitor? Which problems
does it solve?
7. Create one or more methods of calculating a rate for a counter metric. The method
should work even if there is a counter reset. Can your method also calculate a mar-
gin of error?
8. Design a better monitoring system for your current environment.
Search WWH ::




Custom Search