Information Technology Reference
In-Depth Information
17.7 Configuration
The six monitoring components discussed so far all need configuration information to dir-
ect their work. The sensing system needs to know which data to measure and how often.
The collection system needs to know what to collect and where to send it. The analysis
system has a base of formulas to process. The alerting system needs to know who to alert,
how, and who to escalate to. The visualization system needs to know which graphs to gen-
erateandhowtodoso.Thestoragesystemneedstoknowhowtostoreandaccessthedata.
These configurations should be treated like any other software source code: kept under
revision control, tested using both unit tests and system tests, and so on. Revision control
trackschangesofafileovertime,enablingonetoseewhatafilelookedlikeatanypointin
its history. A unit test framework would take as input time-series data for one or more met-
rics and output whether the alert would trigger. This permits one to validate alert formulas.
In distributed monitoring systems, each component may be separated out and perhaps
replicated or sharded. Each piece needs a way to access the configuration. A system like
ZooKeeper, discussed in Section 11.7 , can be used to distribute a pointer to where the full
configuration can be found, which is often a source code or package repository.
Some monitoring systems are multitenant. This is where a monitoring system permits
many teams to independently control metric collection, alert rules, and so on. By central-
izing the service but decentralizing the ability to use it, we empower service owners, deve-
lopers, and others to do their own monitoring and benefit from the ability to automatically
collect and analyze data. Other monitoring systems achieve the same goal by making it
easyforindividualstoinstalltheirowninstanceofthemonitoringsystem,orjusttheirown
sensing and collection components while centralizing storage and other components.
17.8 Summary
Monitoring systems are complex, with many components working together.
The sensing and measurement component takes measurements. Whitebox monitoring
monitors the systems internals. Blackbox monitoring collects data from the perspective of
a user. Gauges measure an amount that varies. Counters are non-decreasing indicators of
how many times something has happened.
The collection system gathers the metrics. Metrics may be pushed (sent) to the collec-
tion system or the collection system may pull (query) systems to gather the metrics.
The storage system stores the metrics. Usually custom databases are used to handle the
large volume of incoming data and take advantage of the unique qualities of time-series
data.
Search WWH ::




Custom Search