Database Reference
In-Depth Information
oversimplification, since the output from mappers is fed to the reducers, but this is under
the control of the MapReduce system; in this case, it needs to take more care rerunning a
failed reducer than rerunning a failed map, because it has to make sure it can retrieve the
necessary map outputs and, if not, regenerate them by running the relevant maps again.)
So from the programmer's point of view, the order in which the tasks run doesn't matter.
By contrast, MPI programs have to explicitly manage their own checkpointing and recov-
ery, which gives more control to the programmer but makes them more difficult to write.
Volunteer Computing
When people first hear about Hadoop and MapReduce they often ask, “How is it different
from SETI@home?” SETI, the Search for Extra-Terrestrial Intelligence, runs a project
called SETI@home in which volunteers donate CPU time from their otherwise idle com-
puters to analyze radio telescope data for signs of intelligent life outside Earth.
SETI@home is the most well known of many volunteer computing projects; others in-
clude the Great Internet Mersenne Prime Search (to search for large prime numbers) and
Folding@home (to understand protein folding and how it relates to disease).
Volunteer computing projects work by breaking the problems they are trying to solve into
chunks called work units , which are sent to computers around the world to be analyzed.
For example, a SETI@home work unit is about 0.35 MB of radio telescope data, and
takes hours or days to analyze on a typical home computer. When the analysis is com-
pleted, the results are sent back to the server, and the client gets another work unit. As a
precaution to combat cheating, each work unit is sent to three different machines and
needs at least two results to agree to be accepted.
Although SETI@home may be superficially similar to MapReduce (breaking a problem
into independent pieces to be worked on in parallel), there are some significant differen-
ces. The SETI@home problem is very CPU-intensive, which makes it suitable for running
on hundreds of thousands of computers across the world [ 9 ] because the time to transfer the
work unit is dwarfed by the time to run the computation on it. Volunteers are donating
CPU cycles, not bandwidth.
MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hard-
ware running in a single data center with very high aggregate bandwidth interconnects. By
contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet
with highly variable connection speeds and no data locality.
Search WWH ::




Custom Search