Information Technology Reference
In-Depth Information
Terms to Know
Outage: A user-visible lack of service.
Failure: A system, subsystem, or component that has stopped working.
Malfunction: Used interchangeably with “failure.”
Server: Software that provides a function or API. (Not a piece of hardware.)
Service: A user-visible system or product composed of one or more servers.
Machine: A virtual or physical machine.
QPS: Queries per second. Usually how many web hits or API calls are received
per second.
Such a strategy is predictive, meaning that it predicts the likelihood of failure or the re-
liability of the system.
Resilient systems continue where predictive strategies leave off. Assuming that failure
will happen, we build systems that react and respond intelligently so that the system as a
whole survives and continues to provide service. In other words, resilient systems are re-
sponsive to failure. Rather than avoiding failure through better hardware or responding to
it with human effort (and apologies), they take a proactive stance and put in place mechan-
isms that expect and survive failure.
Resilient systems decouple component failure from user-visible outages. In traditional
computing,wherethereisafailedcomponent,thereisauser-visibleoutage.Whenwebuild
survivable systems , the two concepts are decoupled.
This chapter is about the various ways we can design systems that detect failure and
work around it. This is how we build survivable systems. The techniques are grouped into
four categories: physical failures, attacks, human errors, and unexpected load.
6.1 Software Resiliency Beats Hardware Reliability
You can build a reliable system by selecting better hardware or better software. Better
hardware means special-purpose CPUs, components, and storage systems. Better software
means adding intelligence to a system so that it detects failures and works around them.
Softwaresolutionsarefavoredformanyreasons.Firstandforemost,theyaremoreeco-
nomical. Once software is written, it can be applied to many services and many machines
with no additional cost (assuming it is home-grown, is open source, or does not require
a per-machine license.) Software is also more malleable than hardware. It is easier to fix,
upgrade, and replace. Unlike hardware upgrades, software upgrades can be automated. As
Search WWH ::




Custom Search