Information Technology Reference
In-Depth Information
You need to instrument your system enough so that you can see when things are going
to fail. To improve on KPIs, we must measure more than just the end result. By measuring
things that are deeper in the pipeline, we can make better decisions about how to achieve
our KPIs. For example, to improve availability, we need to measure the availability of the
component systems that result in the final availability statistic. There may be systems that
get overloaded, queues that grow too long, or bottlenecks that start to choke. There may be
a search tree that works best when the tree is balanced; by monitoring how balanced it is,
we can correlate performance issues with when it becomes imbalanced. To determine why
shopping carts are being abandoned, we must know if there are problems with any of the
web pages during the shopping experience.
All of the previously mentioned KPIs can be monitored by selecting the right metrics,
sometimes in combination with others. For example, determining the 90th percentile re-
quires some calculation. Calculating the average number of items per order might require
two metrics, the total number of items and the total number of orders.
A diagnostic isametriccollected toaidtechnical processessuchasdebuggingandper-
formancetuning.ThesemetricsarenotnecessarilyrelatedtoaKPI.Forexample,wemight
collect a metric that helps us debug an ongoing technical issue that is intermittent but dif-
ficult to find. There is generally a minimal set of metrics one collects from all machines:
system metrics related to CPU, network bandwidth, disk space, disk access, and so on. Be-
ing consistent makes management easier. Hand-crafting a bespoke list for each machine is
rarely a good use of your time.
It is conventional wisdom in our industry to “monitor everything” in hopes of prevent-
ing the situation where you suddenly realize you wish you had historic data on a particular
metric. If this dictate is taken literally, the metrics collection can overwhelm the systems
being monitored or the monitoring system itself. Find a balance by focusing on KPIs first
and diagnostics as needed.
Search WWH ::




Custom Search