Design Patterns for Resiliency - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

unable to provide enough juice to spin up dozens of disks at once. Old solder joints shrink

and crack, leading to mysterious failures. Components from the same manufacturing batch

have similar mortality curves, resulting in a sudden rush of failures.

With our discussion of the many potential malfunctions and failures, we hope we

haven't scared you away from the field of system administration!

6.2.2 The Traditional Approach

Traditional software assumes a perfect, malfunction-free world. This leaves the hardware

systems engineer with the impossible task of delivering hardware that never fails. We fake

it by using redundant array of inexpensive [independent] disks (RAID) systems that let the

software goonpretending that disks never fail. Sheltered from the reality ofa world full of

malfunctions, we enable software developers to continue writing software that assumes a

perfect, malfunction-free world (which, of course, does not exist).

For example, UNIX applications are written with the assumption that reading and writ-

ing files will happen without error. As a result, applications do not check for errors when

writing files. If they did, it would be a waste of time because the blocks may not be written

to disk until later, possibly after the application has been exited. Microsoft Word is written

with the assumption that the computer it runs on will continue to run.

Hyperbole Warning

The previous paragraph included two slight exaggerations. The application layer

of UNIX assumes a perfect file system but the underlying layers do not assume

perfectdisks.MicrosoftWordcheckpointsdocumentssothattheuserdoesnotlose

data in the event of a crash. However, during that crash the user is unable to edit

the document.

Attempts to achieve this impossible malfunction-free world cause companies to spend

a lot of money. CPUs, components, and storage systems known for high reliability are

demonstrablymoreexpensivethancommodityparts. AppendixB detailsthehistoryofthis

strategy and explains the economic benefits of distributed computing techniques discussed

in this chapter.

6.2.3 The Distributed Computing Approach

Distributed computing, in contrast to the traditional approach, embraces components' fail-

ures and malfunctions. It takes a reality-based approach that accepts malfunctions as a fact

of life. Google Docs continues to let a user edit a document even if a machine fails at

Google: another machine takes over and the user does not even notice the handoff.

Search WWH ::

Custom Search

Home