Information Technology Reference
In-Depth Information
The Minimum Monitor Problem
Acommoninterviewquestionis“Ifyoucouldmonitoronlythreeaspectsofaweb
server, what would they be?” This is an excellent test of technical knowledge and
logicalthinking.Itrequiresyoutouseyourtechnicalknowledgetofindonemetric
that can proxy for many possible problems.
For example, much can be learned by performing an HTTPS GET: We learn
whether the server is up, if the service is overloaded, and if there is network con-
gestion. TCP timings indicate time to first byte and time to full payload. The SSL
transaction canbeanalyzed tomonitor SSLcertificate validity andexpiration. The
othertwometrics canbeusedtodifferentiate between thoseissues. KnowingCPU
utilization can help differentiate between network congestion and an overloaded
system. Monitoring the amount of free disk space can indicate runaway processes,
logs filling the disk, and many other problems.
We recommend that you blow the interviewer away by offering to do all that
while measuring one metric and one metric only. Assuming it is an e-commerce
site, simply measure revenue. If it drops in a way that is uncharacteristic for that
time of day, the site is overloaded. If it stops, the site is down (and if it isn't down,
there is reason to investigate anyway). If it ramps up, we know we're going to run
out of capacity soon. It is the one KPI that ties everything together.
16.4 Retention
Retention is how long collected metric data is stored. After the retention time has elapsed,
the old metric data is expired, downsampled, or deleted from the storage system.
How long monitoring data is retained differs for each organization and service. Gener-
ally there is a desire or temptation to store all metrics forever. This avoids the problem of
suddenly realizing the data you want was deleted. It is also simpler than having to decide
on a storage time for each metric.
Unfortunately,storingdataforeverhasacost—notjustintermsofhardwareforstorage,
but also in terms of backups, power, and complexity. Store enough data, and it must be
split between multiple storage systems, which is complex. There may also be legal issues
around how long the data should be retained.
Creating your retention policy should start with collecting business requirements and
goals, and translating them into requirements for the storage system.
Twoyearsisconsideredtobetheminimumstorageperiodbecauseitenablesyear-over-
year comparisons. More is better. It is likely that your next monitoring system will be un-
Search WWH ::




Custom Search