Database Reference
In-Depth Information
their input data are available. This limitation results in the unnecessary
synchronization overhead .
3. MapReduce framework cannot separate the state data from the structure
data. The structure data is shuffled in each iteration between map and
reduce, despite the fact that it remains the same across all iterations. This
results in the unnecessary communication overhead .
The synchronization overhead and the static data communication overhead have been
studied and measured in [21]. iMapReduce aims to address these limitations and provides
an efficient distributed computing framework for implementing iterative algorithms.
3.2 PROGRAMMING MODEL
Based on the observations and analysis, we are aware of the limitations of implementing
iterative algorithms in Hadoop MapReduce. To overcome these performance penalties,
iMapReduce is proposed. The design goals of iMapReduce are as follows:
Goal 1: Supporting iterative processing in one job. In the MapReduce imple-
mentations, a series of MapReduce jobs consisting of map tasks and reduce
tasks are scheduled. Figure 3.1a shows the data flow in the MapReduce imple-
mentation. Each MapReduce job has to load the input data from DFS before
the map operation and dump the output data to DFS after the reduce operation.
In the next iteration, the map function loads the iterated data from DFS again
(a)
(b)
DFS
DFS
Map
Job1
Map
Reduce
Job
DFS
Job2
Reduce
Map
DFS
MapReduce
dataflow
DFS
iMapReduce
dataflow
FIGURE 3.1
Data flow of (a) MapReduce and (b) iMapReduce.
 
Search WWH ::




Custom Search