Information Technology Reference
In-Depth Information
ture request so that the vendor can see the context of the request. (See Section 14.3.2 for
more details on writing good postmortem reports.)
Ifthevendorisunresponsivetoyourrequests,youmaybeabletowritecodethatbuilds
frameworks around the vendor's software. For example, you might create a wrapper that
provides startup and shutdown services in a clean manner around vendor software that
handles those tasks ungracefully. We highly recommend publishing such systems extern-
ally as open source products. If you need them, someone else will, too. Developing a com-
munity around your code will make its support less dependent on your own efforts.
2.3 Improving the Model
Good design for operations makes operations easy. Great design for operations helps elim-
inate some operational duties entirely. It's a force multiplier often equivalent to hiring an
extra person. When possible, strive to create systems that embed knowledge or capability
into the process, replacing the need for operational intervention. The job of the operations
staff then changes from performing repetitive operational tasks to building, maintaining,
and improving the automation that handles those tasks.
Tom once worked in an environment where resource allocations were requested via
emailandprocessedmanually.OnceanAPIwasmadeavailable,theentireprocessbecame
self-service; users could manage their own resources.
Some provisioning systems let you specify how much RAM, disk, and CPU each “job”
will need. A better system does not require you to specify any resources at all: it monitors
use and allocates what is needed, maintaining an effective balance among all the jobs on a
cluster, and reallocating them and shifting resources around over time.
A common operational task is future capacity planning—that is, predicting how many
resources will be needed 3 to 12 months out. It can be a lot of work. Alternatively, a
thoughtfullyconstructeddatacollectionandanalysissystemcanmakethesepredictionsfor
you. For more information on capacity planning, see Chapter 18 .
Creating alert thresholds and fine-tuning them can be an endless task. That work can
be eliminated if the monitoring system sets its own thresholds. For example, one web site
developed an accurate prediction model for how many QPS it should receive every hour
of the year. The system administrators could then set an alert if the actual QPS was more
than 10 percent above or below the prediction. Monitoring hundreds of replicas around the
world can't be done manually without huge investments in staff. By eliminating this oper-
ational duty, the system scaled better and required less operational support.
Search WWH ::




Custom Search