Database Reference
In-Depth Information
KVs). Then, a series of MapReduce jobs update the state KVs based on the
structure KVs.
Use of State/Structure KVs. In each iteration, a mapper operating on a
structure KV produces intermediate KVs based the state KVs, and a reducer
updates a state KV. For example, in PageRank, the mapper operating on
each node needs a node's PageRank score (state KV) and its outgoing neighbors
(structure KV), and the reducer updates a node's PageRank score (state KV);
In K-means, the mapper operating on each point needs the coordinates of all
centroids (state KVs) and its own coordinate (structure KV), and the reducer
updates a centroid's coordinate (state KV); In MPI, the mapper 2 operating on
each column of M needs the j th column of N (state KV) and the j th column of
M (structure KV), and the reducer 2 updates a column of N (state KV).
Mappings between Reducers and Mappers. The map function operates
on a structure KV and produces intermediate results based on one or more
state KVs. Accordingly, a structure KV has different mappings with the
state KVs. For example, in PageRank, the mapper operates on a node's
PageRank score and the same node's outgoing neighbors, which is “one-
to-one” mapping. In K-means, the mapper operates on a point and all cen-
troids, which is “one-to-all” mapping. In MPI, the mapper 1 operates on an
entry ( i , j ) of M and the entries of the j th row of N , which is “one-to-more”
mapping. To sum up, a mapper operates on a structure KV, and a reducer
updates a state KV. That is, the mappings between a reducer of the i th job
and the mappers of the ( i + 1)th job can be different.
These observations reflect the properties of iterative computations in MapReduce,
which are very helpful. Programmers can first find the state/structure KVs and then
define the iterative data flow to design the MapReduce programs. Moreover, these
properties are useful for us to design an efficient programming model for iterative
computations. iMapReduce is proposed based on these properties.
3.1.4 l imiteD s uPPort oF i iterative C omPutations by m aP r eDuCe
As described above, MapReduce can be used to implement iterative algorithms.
However, MapReduce has limited support for iterative computations. We list three
performance penalties of implementing iterative algorithms in Hadoop MapReduce.
1. The operations in all iterations are the same. Nevertheless, MapReduce
implementation starts one/more new job(s) for each iteration, which involves
repeated task initializations and cleanups. Moreover, these jobs have to load
the input data from DFS and dump the output data to DFS repeatedly. It is
known that Hadoop usually has around 20 seconds of start-up overhead [7].
A series of jobs might result in the unnecessary job startup overhead .
2. The map tasks in an iteration cannot start before finishing all the reduce
tasks in the previous iteration. The main loop in the MapReduce implemen-
tation requires the completion of the previous iteration job before starting
the next iteration job. However, the map tasks should be started as soon as
Search WWH ::




Custom Search