Database Reference
In-Depth Information
(2.2)
The final step, which is step (5), is to divide both sides of Equation 2.2 by p , the number
of inputs. As a result, the left side, which is is equal to the replication rate, and the
right side becomes p/q . That is, we have proved the lower bound on r :
r p/q
As claimed, this shows that the family of algorithms from Example 2.15 all have a replica-
tion rate that is at most twice the lowest possible replication rate.
2.6.7
Case Study: Matrix Multiplication
In this section we shall apply the lower-bound technique to one-pass matrixmultiplication
algorithms. We saw one such algorithm in Section 2.3.10 , but that is only an extreme case
of a family of possible algorithms. In particular, for that algorithm, a reducer corresponds
to a single element of the output matrix. Just as we grouped inputs in the similarity-join
problem to reduce the communication at the expense of a larger reducer size, we can group
rows and columns of the two input matrices into bands. Each pair consisting of a band of
rows of the first matrix and a band of columns of the second matrix is used by one reducer
to produce a square of elements of the output matrix. An example is suggested by Fig. 2.11 .
Figure 2.11 Dividing matrices into bands to reduce communication
In more detail, suppose we want to compute MN = P , and all three matrices are n × n .
Group the rows of M into g bands of n/g rows each, and group the columns of N into g
bands of n/g columns each. This grouping is as suggested by Fig. 2.11 . Keys correspond to
two groups (bands), one from M and one from N .
The Map Function For each element of M , the Map function generates g key-value pairs.
The value in each case is the element itself, together with its row and column number so
it can be identified by the Reduce function. The key is the group to which the element be-
longs, paired with any of the groups of the matrix N . Similarly, for each element of N , the
Search WWH ::




Custom Search