Database Reference
In-Depth Information
The computation cost for algorithms in this family is independent of the number of
groups g , as long as the input to each reducer fits in main memory. The reason is that the
bulk of the computation is the application of function s to the pairs of pictures. No matter
what value g has, s is applied to each pair once and only once. Thus, although the work
of algorithms in the family may be divided among reducers in widely different ways, all
members of the family do the same computation.
2.6.3
A Graph Model for MapReduce Problems
In this section, we begin the study of a technique that will enable us to prove lower bounds
on the replication rate, as a function of reducer size for a number of problems. Our first step
is to introduce a graph model of problems. For each problem solvable by a MapReduce al-
gorithm there is:
(1) A set of inputs;
(2) A set of outputs;
(3) A many-many relationship between the inputs and outputs, which describes which in-
puts are necessary to produce which outputs.
EXAMPLE 2.13 Figure 2.9 shows the graph for the similarity-join problem discussed in Sec-
tion 2.6.2 , if there were four pictures rather than a million. The inputs are the pictures, and
the outputs are the six possible pairs of pictures. Each output is related to the two inputs
that are members of its pair.
Figure 2.9 Input-output relationship for a similarity join
EXAMPLE 2.14 Matrix multiplication presents a more complex graph. If we multiply n × n
matrices M and N to get matrix P , then there are 2 n 2 inputs, m ij and n jk , and there are n 2
outputs p ik . Each output p ik is related to 2 n inputs: m i 1 , m i 2 , . . . , m in and n 1 k , n 2 k , . . . , n nk .
Moreover, each input is related to n outputs. For example, m ij is related to p i 1 , p i 2 , . . . ,
p in . Figure 2.10 shows the input-output relationship for matrix multiplication for the simple
case of 2 × 2 matrices, specifically
Search WWH ::




Custom Search