Databases Reference
In-Depth Information
managed by the Master, and the map-reduce job will complete eventually.
Suppose the compute node at which a Map worker resides fails. This fail-
ure will be detected by the Master, because it periodically pings the Worker
processes. All the Map tasks that were assigned to this Worker will have to
be redone, even if they had completed. The reason for redoing completed Map
tasks is that their output destined for the Reduce tasks resides at that compute
node, and is now unavailable to the Reduce tasks. The Master sets the status
of each of these Map tasks to idle and will schedule them on a Worker when
one becomes available. The Master must also inform each Reduce task that the
location of its input from that Map task has changed.
Dealing with a failure at the node of a Reduce worker is simpler. The Master
simply sets the status of its currently executing Reduce tasks to idle.
These
will be rescheduled on another reduce worker later.
2.3
Algorithms Using Map-Reduce
Map-reduce is not a solution to every problem, not even every problem that
profitably can use many compute nodes operating in parallel. As we mentioned
in Section 2.1.2, the entire distributed-file-system milieu makes sense only when
files are very large and are rarely updated in place. Thus, we would not expect
to use either a DFS or an implementation of map-reduce for managing on-
line retail sales, even though a large on-line retailer such as Amazon.com uses
thousands of compute nodes when processing requests over the Web. The reason
is that the principal operations on Amazon data involve responding to searches
for products, recording sales, and so on, processes that involve relatively little
calculation and that change the database. 2 On the other hand, Amazon might
use map-reduce to perform certain analytic queries on large amounts of data,
such as finding for each user those users whose buying patterns were most
similar.
The original purpose for which the Google implementation of map-reduce
was created was to execute very large matrix-vector multiplications as are
needed in the calculation of PageRank (See Chapter 5). We shall see that
matrix-vector and matrix-matrix calculations fit nicely into the map-reduce
style of computing. Another important class of operations that can use map-
reduce effectively are the relational-algebra operations. We shall examine the
map-reduce execution of these operations as well.
2.3.1
Matrix-Vector Multiplication by Map-Reduce
Suppose we have an n×n matrix M , whose element in row i and column j will
be denoted m ij . Suppose we also have a vector v of length n, whose jth element
is v j . Then the matrix-vector product is the vector x of length n, whose ith
2 Remember that even looking at a product you don't buy causes Amazon to remember
that you looked at it.
Search WWH ::




Custom Search