iMapReduce - Large Scale and Big Data: Processing and Management - page 113

Database Reference

In-Depth Information

and repeats the process. However, one design goal of iMapReduce is to support

iterative processing in a single job. That is, the reduce output is fed to the map

for next round iteration, and the map/reduce operations are kept executing till

the iteration is terminated. Figure 3.1b shows the data flow in iMapReduce.

Only one job is scheduled. The dashed line indicates that the data loading from

DFS happens only once in the initialization stage, and the output data are writ-

ten to DFS only once when the iteration terminates. By supporting iterative

processing in one job, we can avoid the job startup overhead.

•

Goal 2: Executing map tasks asynchronously. In MapReduce implementa-

tions, two synchronization barriers exist, during map-to-reduce shuffles and

between MapReduce jobs, respectively. Due to the synchronization barrier

between jobs, the map tasks of the next iteration job cannot start before the

completion of the previous iteration job, which requires the completion of all

reduce tasks. However, a map task is fed with a reduce task. The map task

can start its execution as soon as its input from a reduce task is ready, rather

than waiting for the completion of all reduce tasks. iMapReduce schedules the

execution of map tasks asynchronously. By enabling the asynchronous execu-

tion of the map tasks, we can avoid the synchronization overhead.

•

Goal 3: Separating dynamic state data from static structure data.

Even though the structure data is unchanged from iteration to iteration, the

MapReduce implementations reload and reshuffle the unchanged structure

data in each iteration, which poses considerable communication overhead.

iMapReduce differentiates the iteratively updated state data from the static

structure data. Only the state data is iterated and communicated in each

iteration, while the structure data is maintained locally over iterations. By

separating the dynamic state data from the static structure data, we can

avoid the communication overhead on transferring the static structure data.

Based on these design goals, iMapReduce programming model is designed, which

is shown in Figure 3.2. To support iterative processing, an internal loop that makes

the reduce output directly fed as the map input is constructed, such that the data can

be iterated automatically. By constructing this internal loop, the iterative algorithm

< SK, SV >

< DK, DV >

Structure

data

State

data

Join

< SK, SV >

< DK, DV >

Map

Shuing

< IK, IV >

Reduce

Structure dataflow

< DK, DV >

State dataflow

FIGURE 3.2

iMapReduce programming model.

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home