Database Reference
In-Depth Information
and repeats the process. However, one design goal of iMapReduce is to support
iterative processing in a single job. That is, the reduce output is fed to the map
for next round iteration, and the map/reduce operations are kept executing till
the iteration is terminated. Figure 3.1b shows the data flow in iMapReduce.
Only one job is scheduled. The dashed line indicates that the data loading from
DFS happens only once in the initialization stage, and the output data are writ-
ten to DFS only once when the iteration terminates. By supporting iterative
processing in one job, we can avoid the job startup overhead.
Goal 2: Executing map tasks asynchronously. In MapReduce implementa-
tions, two synchronization barriers exist, during map-to-reduce shuffles and
between MapReduce jobs, respectively. Due to the synchronization barrier
between jobs, the map tasks of the next iteration job cannot start before the
completion of the previous iteration job, which requires the completion of all
reduce tasks. However, a map task is fed with a reduce task. The map task
can start its execution as soon as its input from a reduce task is ready, rather
than waiting for the completion of all reduce tasks. iMapReduce schedules the
execution of map tasks asynchronously. By enabling the asynchronous execu-
tion of the map tasks, we can avoid the synchronization overhead.
Goal 3: Separating dynamic state data from static structure data.
Even though the structure data is unchanged from iteration to iteration, the
MapReduce implementations reload and reshuffle the unchanged structure
data in each iteration, which poses considerable communication overhead.
iMapReduce differentiates the iteratively updated state data from the static
structure data. Only the state data is iterated and communicated in each
iteration, while the structure data is maintained locally over iterations. By
separating the dynamic state data from the static structure data, we can
avoid the communication overhead on transferring the static structure data.
Based on these design goals, iMapReduce programming model is designed, which
is shown in Figure 3.2. To support iterative processing, an internal loop that makes
the reduce output directly fed as the map input is constructed, such that the data can
be iterated automatically. By constructing this internal loop, the iterative algorithm
< SK, SV >
< DK, DV >
Structure
data
State
data
Join
< SK, SV >
< DK, DV >
Map
Shuing
< IK, IV >
Reduce
Structure dataflow
< DK, DV >
State dataflow
FIGURE 3.2
iMapReduce programming model.
Search WWH ::




Custom Search