Database Reference
In-Depth Information
column  k , then the product P = MN is the matrix P with element p ik in row i and
column k , where p
= .
In the MapReduce framework, we use two map-reduce phases to perform the
matrix multiplication,* which forms each iteration. In the first phase, the map
extracts the columns of M and the rows of N , and the reduce joins column j of M and
row j of N together. In the second phase, the map multiplies a column vector with the
joined row vector to obtain a matrix, and the reduce sums these matrices to obtain
the final result matrix. The map and reduce operations in each iteration are described
as follows.
m n
ik
ij
jk
j
MPI Map 1: For each key ( i , j ), send each matrix element m ij of M to the KV
j , ( M , i , m ij )〉. For each key ( j , k ), send each matrix element n jk of N to the
KV 〈 j , ( N , k , n jk )〉.
MPI Reduce 1: For each key j , collect its list of associated values
j
(
) (
)
(
)
(
)
,
Mim
, ,
,
Mi m
,,
,
,
N kn
,,
,
Nk n
,
,
, .
1
ij
2
ij
1
jk
2
j
k 2
1
2
1
MPI Map 2: Take the output KV of Reduce 1. For each value that comes from
M , say ( M , i , m ij ), and each value that comes from N , say ( N , k , n jk ), produce
the KV 〈( i , k ), m ij n jk 〉. It will output all the permutations (, ),
ik mn
ij
,
11
jk
1
1
21 2 1 , …
MPI Reduce 2: For each key ( i , k ), produce the sum of the list of values associ-
ated with this key. The result KV is 〈( i , k ), p ik 〉, where p
(, ),
ik mn
ij
,…, (, ),
ik mn
ij jk
12
jk
1
2
= .
m n
ik
ij
jk
j
3.1.3 P roPerties oF i iterative C omPutations in m aP r eDuCe
Through these examples, we have several observations about writing iterative algo-
rithms in Hadoop MapReduce.
• Iterative Data Flow. MapReduce exploits a batched processing model. By
using MapReduce to implement an iterative algorithm, the iterative loop is
missing. On the other hand, to emulate an iterative data flow, an iterative
algorithm is implemented by a series of MapReduce jobs. Each iteration
corresponds to one or more jobs. The next job's input is the previous job's
output. The map/reduce operations in all iterations are the same.
• Dynamic State Data vs. Static Structure Data. Iterative computation
involves two kinds of data, the state data and the structure data . During the
iterative computation, the state data is iteratively updated (e.g., the PageRank
scores in PageRank, the centroids in K-means, or the matrix multiplier N  =  M k −1
in MPI), while the structure data is iteration-invariant (e.g., the Web linkage
graph in PageRank, the point coordinates in K-means, or the basic matrix M
in MPI). To process these data in MapReduce, both the dynamic state data and
the static structure data are represented by KVs (i.e., state KVs and structure
* http://infolab.stanford.edu/ullman/mmds/book.pdf.
Search WWH ::




Custom Search