Information Technology Reference
In-Depth Information
High Availability
With distributed computing techniques, high availability and scaling became closely
coupled. Techniques used to ensure high availability also aided scaling, and vice versa.
Google published many of the distributed computing techniques it invented. Some of the
earliest of these were the Google File System (GFS) and MapReduce.
GFS was a file system that scaled to multiple terabytes of data. Files were stored as
64-megabyte chunks. Each chunk was stored on three or more commodity servers. Ac-
cessing the data was done by talking directly to the machine with the desired data rather
than going through a mediator machine that could be a bottleneck. GFS operations often
happenedinparallel.Forexample,copyingafilewasnotdoneoneblockatatime.Instead,
eachmachinethatstoredapartofthesourcefilewouldbepairedwithanothermachineand
all blocks would be transmitted in parallel. Resiliency could be improved by configuring
GFS to keep more than three replicas of each block, decreasing the chance that all copies
ofthe file wouldbeonadownmachine at anygiven time. Ifonemachine didfail, the GFS
master would use the remaining copies to populate the data elsewhere until the replication
factor was achieved. Increasing the replication level also improved performance. Data ac-
cess was load balanced among all the replicas. More replicas meant more aggregate read
performance.
Search WWH ::




Custom Search