Information Technology Reference
In-Depth Information
that survives outages? ROC reasoned that if the software already survived outages, why
payforextraqualityateverylevel?Usingthecheapercommoditycomponentsmightresult
in less reliable servers, but software techniques would result in a system that was more re-
liable as a whole. Since the software had to be developed anyway, this made a lot of sense.
Digital electronics either work or they don't. The ability to provide service is binary:
the computer is onoroff.This is called the “run runrundead” problem. Digital electronics
typically run and run and run until they fail. At that point, they fail completely. The system
stops working. Compare this to an analog system such as an old tube-based radio. When it
is new, it works fine. Over time components start to wear out, reducing audio quality. The
user hears more static, but the radio is usable. The amount of static increases slowly, giv-
ing the user months of warning before the radio is unusable. Instead of “run run run dead,”
analog systems degrade slowly.
With distributed computing techniques, each individual machine is still digital: it is
either running or not. However, the collection of machines is more like the analog radio:
thesystemworksbutperformancedropsasmoreandmoremachinesfail.Asinglemachine
being down is not a cause for alarm but rather a signal that it must be repaired before too
many other machines have also failed.
Scaling
ROC researchers demonstrated that distributed computing could be reliable enough to
provide a service requiring high availability. But could it be fast enough? The answer to
this question was also “yes.” The computing power of a fully loaded Sun E10K could be
achieved with enough small, pizza box-sized machines based on commodity hardware.
Web applications were particularly well suited for distributed computing. Imagine a
simple case where the contents of a web site can be stored on one machine. A large server
might be able to deliver 4000 queries per second (QPS) of service. Suppose a commodity
server could provide only 100 QPS. It would take 40 such servers to equal the aggregate
capacity of the larger machine. Distributed computing algorithms for load balancing can
easily scale to 40 machines.
Data Scaling
Forsomeapplications,allthedatafortheservicemightnotfitonasinglecommodityserv-
er.Thesecommodityserversweretoosmalltostoreaverylargedataset.Applicationssuch
as web search have a dataset, or “corpus,” that could be quite large. Researchers found that
they could resolve this issue by dividing the corpus into many “fractions,” each stored on a
differentmachine,or“leaf.”Othermachines(called“theroot”)wouldreceiverequestsand
forward each one to the appropriate leaf.
Search WWH ::




Custom Search