Database Reference
In-Depth Information
element will be turned into many key-value pairs. The keys will be pairs ( i, k ), where i is a
row of M and k is a column of N . Here is a synopsis of the Map and Reduce functions.
The Map Function : For each element m ij of M , produce all the key-value pairs (( i, k ) , ( M,
j, m ij )) for k = 1 , 2 , . . . , up to the number of columns ( of N . Similarly, for each element
n jk of N , produce all the key-value pairs ( i, k ) , ( N, j, n jk )) for i = 1 , 2 , . . . , up to the number
of rows of M . As before, M and N are really bits to tell which of the two relations a value
comes from.
The Reduce Function Each key ( i, k ) will have an associated list with all the values ( M,
j, m ij ) and ( N, j, n jk ), for all possible values of j . The Reduce function needs to connect the
two values on the list that have the same value of j , for each j . An easy way to do this step
is to sort by j the values that begin with M and sort by j the values that begin with N , in
separate lists. The j th values on each list must have their third components, m ij and n jk ex-
tracted and multiplied. Then, these products are summed and the result is paired with ( i, k )
in the output of the Reduce function.
You may notice that if a row of the matrix M or a column of the matrix N is so large that
it will not fit in main memory, then the Reduce tasks will be forced to use an external sort
to order the values associated with a given key ( i, k ). However, in that case, the matrices
themselves are so large, perhaps 10 20 elements, that it is unlikely we would attempt this
calculation if the matrices were dense. If they are sparse, then we would expect many fewer
values to be associated with any one key, and it would be feasible to do the sum of products
in main memory.
2.3.11
Exercises for Section 2.3
EXERCISE 2.3.1 Design MapReduce algorithms to take a very large file of integers and pro-
duce as output:
(a) The largest integer.
(b) The average of all the integers.
(c) The same set of integers, but with each integer appearing only once.
(d) The count of the number of distinct integers in the input.
EXERCISE 2.3.2 Our formulation of matrix-vector multiplication assumed that the matrix
M was square. Generalize the algorithm to the case where M is an r -by- c matrix for some
number of rows r and columns c .
Search WWH ::




Custom Search