Information Technology Reference
In-Depth Information
To make the system more resilient to failures, each fraction could be stored on two dif-
ferent leaves. If there were 10 fractions, there would be 20 leaves. The root would divide
the traffic for a particular fraction among the two leaves as long as both were up. If one
failed, the root would send all requests related to that fraction to the remaining leaf. The
chance of a simultaneous failure by two leaves holding the same data was unlikely. Even if
it did happen, users might not notice that their web searches returned slightly fewer results
until the replacement algorithms loaded the missing data onto a spare machine.
Scaling was also achieved through replication. If the system did not process requests
fast enough, it could be scaled by adding leaves. A particular fraction might be stored in
three or more places.
The algorithms got more sophisticated over time. For example, rather than splitting the
corpus into 10 fractions, one for each machine, the corpus could be split into 100 fractions
and each machine would store 10. If a particular fraction was receiving a particularly large
numberofhits(itwas“hot”),thatfractioncouldbeplacedonmoremachines,bumpingout
lesspopularfractions.Betteralgorithmsresultedinbetterplacement,diversity,anddynam-
ically updatable corpus data.
Applicability
These algorithms were particularly well suited for web search and similar applications
where the data was mostly static (did not change) except for wholesale replacements when
a new corpus was produced. In contrast, they were inappropriate for traditional applica-
tions. After all, you wouldn't want your payroll system built on a database that dealt with
machine failures by returning partial results. Also, these systems lacked many of the fea-
tures of traditional databases related to consistency and availability.
New distributed computing algorithms enabled new applications one by one. For ex-
ample,thedesiretoprovideemailasamassiveweb-basedserviceledtobetterstoragesys-
tems. Over time more edge cases were conquered so that distributed computing techniques
could be applied to more applications.
Search WWH ::




Custom Search