The Origins and Future of Distributed Computing and Clouds - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

that survives outages? ROC reasoned that if the software already survived outages, why

payforextraqualityateverylevel?Usingthecheapercommoditycomponentsmightresult

in less reliable servers, but software techniques would result in a system that was more re-

liable as a whole. Since the software had to be developed anyway, this made a lot of sense.

Digital electronics either work or they don't. The ability to provide service is binary:

the computer is onoroff.This is called the “run runrundead” problem. Digital electronics

typically run and run and run until they fail. At that point, they fail completely. The system

stops working. Compare this to an analog system such as an old tube-based radio. When it

is new, it works fine. Over time components start to wear out, reducing audio quality. The

user hears more static, but the radio is usable. The amount of static increases slowly, giv-

ing the user months of warning before the radio is unusable. Instead of “run run run dead,”

analog systems degrade slowly.

With distributed computing techniques, each individual machine is still digital: it is

either running or not. However, the collection of machines is more like the analog radio:

thesystemworksbutperformancedropsasmoreandmoremachinesfail.Asinglemachine

being down is not a cause for alarm but rather a signal that it must be repaired before too

many other machines have also failed.

Scaling

ROC researchers demonstrated that distributed computing could be reliable enough to

provide a service requiring high availability. But could it be fast enough? The answer to

this question was also “yes.” The computing power of a fully loaded Sun E10K could be

achieved with enough small, pizza box-sized machines based on commodity hardware.

Web applications were particularly well suited for distributed computing. Imagine a

simple case where the contents of a web site can be stored on one machine. A large server

might be able to deliver 4000 queries per second (QPS) of service. Suppose a commodity

server could provide only 100 QPS. It would take 40 such servers to equal the aggregate

capacity of the larger machine. Distributed computing algorithms for load balancing can

easily scale to 40 machines.

Data Scaling

Forsomeapplications,allthedatafortheservicemightnotfitonasinglecommodityserv-

er.Thesecommodityserversweretoosmalltostoreaverylargedataset.Applicationssuch

as web search have a dataset, or “corpus,” that could be quite large. Researchers found that

they could resolve this issue by dividing the corpus into many “fractions,” each stored on a

differentmachine,or“leaf.”Othermachines(called“theroot”)wouldreceiverequestsand

forward each one to the appropriate leaf.

Search WWH ::

Custom Search

Home